Tag Archives: Regression Analysis

Predictive Analytics 101 – The Real Business Intelligence, part 2

by Ashith Bolar,  Director AmBr Data Labs @ Amick Brown

In my first post, “The Real Business Intelligence” ,  I emphasized on the significance of Predictive Analytics in the Business Intelligence space.  Let us take a deeper look at Predictive Analytics by way of more concrete examples.

As a refresher, Predictive Analytics is a set of tools and techniques based on statistical and mathematical techniques to analyze historical data and subsequently predict the future. The basic premise is that by analyzing historical data, determining relationships, more specifically correlations between related (and sometimes seemingly unrelated) attributes and entities, one can derive significant insights into a system. These insights can be further used to make predictions.

Let’s take a look at this process step by step.

The fundamental component in Predictive Analytics is a Predictive Model, or just model. A Predictive Model is set of data points plus a series of algorithms working on that data. It attempts to capture the relationships between the data points by means of applying mathematical or statistical computations deployed as algorithms.

The output of a model is typically a single number — called a score. The score essentially is an quantitative value for a specific prediction by the model based on historical data. Higher the score, the more likely a certain behavior is predicted. Lower the score, the more likely the opposite behaviour is predicted.

Predictive models can be built for a wide variety of problems. But the most common predictive models, especially in the context of a business application, is one that predicts people’s behaviors. Predictive models are designed to predict how people behave under new circumstances, given what we know about how they behaved in the past with other known circumstances. For instance, Netflix’s movie recommendations — based on the movies that you have seen and rated highly (known circumstances) recommendations for new movies (unknown circumstances) are generated.

You will hear terms like “Machine Learning”, “Artificial Intelligence”, “Regression Analysis”, etc. While, each one is an independent area of mathematics and computing, for a Predictive Analytics suite, these are just different algorithms (computing models) that are employed in the process of predicting.

Let’s dig deeper into the concept of Scores. Let’s take two classical examples of scores generated on customers.

  1. Based on the movies you have seen and rated in the past, Netflix tries to determine if you will like a new movie or not. Say for instance, a scoring of 0-10: 10 being a prediction of you absolutely loving the new movie, and 0 being a prediction of you absolute not caring about it. This type of a score is called a Probability Score. In essence, the score tells you the probability of you liking a movie.
  2. Another type of score is called the Quantitative Score. Here the prediction is not the probability of whether you will like the movie or not, instead to quantitatively predict the amount of something. For instance, Life Insurance companies try to predict how long a certain customer will live based on the life choices and other circumstances of the customer.

In case of the Netflix model (Probability Score), if a customer gets a 8 (out of 10) for the likelihood of liking a particular movie, it can be rephrased as “There’s a 80% chance that the customer will like this movie, and a 20% chance that they will not like it”. Such a prediction is basing its prediction on the spread of probabilities (probability distribution). Another way of looking at this score of 8 (out of 10 is) “The customer might not absolutely love this movie (which would be a 10/10), but definitely not absolutely hate the movie (0/10). Instead the customer is more likely actually liking the movie to some extent (8/10), rather than completely disinterested (5/10).

In either case, a careful examination of this score tells us that all the system is doing is categorizing people’s behaviours into a set number of ranks. Therefore, predictive models which generate probability scores are usually called classification models. On the other hand, the quantitative scoring (predicted life expectancy of an insurance customer) is really a quantitative number. Another classic example is customer spend which is how much a customer is willing to pay for a new product or service. The actual value is reached at by means of various statistical and mathematical computations. These models are typically known as regression models.

A good predictive model might not be accurate in every single case, but given a large set of data (read target customers), the model regresses to the predicted mean of the behavior.

In the following posts, we will delve deeper into predictive algorithms, and try to gain a better understanding of how they work, and more importantly why they work, and why they are important in your corporate strategy.