Propensity predictions indicate the likelihood of an individual to perform a defined action (buy, buy again, churn, etc.), which enable businesses to grow and retain revenue by engaging with the right customers, leads, or audiences at the right time. To get accurate propensity predictions, you need a good propensity model. Here’s how Faraday approaches propensity modeling.
Our prediction system classifies against outcomes using an array of predictive algorithms, including individual decision trees, random decision forests, logistic regressions, and neural networks. At run time, we are attempting, in each case, to figure out which algorithm works best. Usually it’s random decision forests.
Random decision forests are made up of individual decision trees, which are classifier algorithms that look like flow charts, showing the choices made to reach a certain outcome.
Predictions aren’t based on a single tree, though; we use dozens of trees to improve the accuracy of the algorithm, hence the random decision forest. We look at the universe of trees, their splits and branches, and choose the ones that are the most promising when it comes to creating a path towards the desired outcome. This path leads from seemingly arbitrary data to a classification: the “yes” or “no” to the outcome. In other words, we're defining someone as a “good potential customer.” (In the image above, the “yes” is “mostly converted” and the “no” is the “mostly did not convert.”)
While the classification seems binary, the reality is that often there's not a clear “yes” or “no.” We are dealing with the likelihood that someone performs an action, and that is never definitively 100% or 0%. During the prediction process, we employ propensity scores, which are a way of saying, “This person is more likely to be a good potential customer than not.” Every individual in the group we’re looking at is scored, and those in the top percentiles are recommended to the client as the best fits for the outcome they want to realize.
Now, if there are so many predictive algorithms we can use, why do we choose random decision forests?
Well, for one, they’re the most explainable; with so many decision trees at play, we have a strong sense of the different importances at each node split in a tree. These importances show us which attributes contribute most toward the success of an outcome, or the most toward failure.
The random decision forest technique also handles missing data very well, especially compared to a logistic regression or neural network, which both tend to require values to make predictions. In a decision tree, if there is a missing value, it moves onto the next node because there are other decision paths in the case of missing data. When you scale that to a whole random decision forest, the prediction becomes very accurate and isn’t perturbed by missing data.
A third reason we use random decision forests is collinearity. Random decision forests may see that some data are correlated — say, mortgage value and income. A lot of the attributes the algorithm deals with are linearly related but don’t necessarily directly drive a certain customer behavior.
A random decision forest algorithm finds areas of greatest information gain, meaning that it may realize there is interdependence between attributes, but it will choose the feature that gives the algorithm the greatest leverage over the prediction. This is particularly helpful for us, as we deal with tremendous amounts of customer data, and data on humans is generally noisy and often subverts expected distributions and assumptions.
Propensity predictions are especially helpful in improving campaign targeting efforts across a number of marketing channels. By identifying individuals ranked in the top percentile of the analyzed group—those that have the highest likelihood of performing the desired action—businesses typically see improvements in efficiency metrics (cost per acquisition, return on ad spend, etc.). Here are some popular business use cases for propensity predictions:
- Reaching likely buyers
- Driving repeat purchases
- Scoring and prioritizing leads