Propensity vs probability: Understanding the difference between raw scores and probabilities

In the world of machine learning, we often encounter two important concepts: the raw propensity score and probability. Though they may sound similar, they represent distinct aspects of a machine learning model's predictions.

Example of propensity scores vs predicted probabilities

Imagine you have a model that predicts whether it will rain tomorrow. The raw propensity score is like the model's initial hunch or gut feeling about the outcome. It's a numeric value that the model calculates based on various factors like historical weather data, humidity, temperature, and more. The propensity score indicates how likely the model thinks it will rain or not. However, the propensity score itself doesn't give us a clear understanding of how confident the model is in its prediction.

This is where probability comes into play. The predicted probability aims at providing an answer to the question “how likely is the prediction?” It represents the likelihood of a particular outcome occurring. For example, a score of 0.75 for rain means the model is fairly confident it will rain tomorrow. However, a probability of 0.75 indicates that there is a 75% chance that it will rain tomorrow.

In summary, the raw propensity score is the model's initial guess, while probability quantifies the model's confidence in that guess. Understanding this difference can help us interpret machine learning predictions and make better-informed decisions based on the model's level of certainty.

Understanding model calibration

The model calibration process refers to the alignment between a model's predicted probabilities and the actual likelihood of events occurring. A good calibration ensures that the predicted probabilities are close to the true theoretical probabilities. Not all machine learning models are perfectly calibrated after training. Some models might be overconfident, providing probabilities that are too extreme, while others might be underconfident, consistently underestimating or overestimating probabilities.

Importance of model calibration

Calibration is crucial because it impacts decision-making. If a model is well-calibrated, we can trust its probability estimates and use them to make informed choices, such as whether to take an umbrella based on the probability of rain.

Calibrating a model involves post-processing its probability scores to bring them in line with the true probabilities. Various calibration techniques can be employed, such as Platt scaling or isotonic regression, which adjust the raw propensity scores to produce well-calibrated probabilities. These techniques are typically applied during model validation and fine-tuning to ensure that the model's predictions are more accurate and reliable.

The calibration process ensures that when the model predicts a probability, it reflects the true likelihood of the event occurring. For example, if a model predicts a 70% probability of rain, it should be accurate in approximately 70% of the cases where it makes that prediction.

In summary, model calibration in machine learning is all about making sure a model's predicted probabilities accurately reflect the real-world likelihood of events happening. A well-calibrated model is more reliable and can lead to more effective decision-making.

Benefits of using predicted probabilities

Using predicted probabilities instead of raw propensity scores offers several significant benefits that enhance the interpretability, decision-making, and overall performance of the models:

Interpretability

Predicted probabilities provide a more intuitive and understandable representation of a model's output. Raw propensity scores are often abstract values that lack clear context, while probabilities are easily interpretable as the likelihood of an event occurring. This makes it easier for stakeholders and end-users to comprehend the model's predictions and build trust in its results.

Uncertainty quantification

Predicted probabilities convey the model's level of certainty about its predictions. Models can provide probability scores ranging from 0 to 1, indicating their confidence in the predicted outcomes. This uncertainty quantification is valuable in decision-making, as it allows users to consider the risk associated with a prediction and adjust their actions accordingly.

Threshold selection

Predicted probabilities offer flexibility in choosing decision thresholds. Depending on the specific application and requirements, decision-makers can adjust the probability threshold for classifying positive and negative outcomes. This adaptability allows for a fine-tuning of the model's behavior to suit the desired trade-offs between false positives and false negatives. For example, in the case of a lead conversion, a profitability threshold can be established by taking into account cost of acquisition for each lead, expected profit from actual customers, and probability of conversion.

Calibration and reliability

When models are well-calibrated, predicted probabilities accurately reflect the real-world likelihood of events. Calibrated probabilities provide more reliable estimates, leading to improved decision-making and better performance in critical applications.

Ensemble and stacking

Predicted probabilities facilitate model ensembling and stacking techniques. By combining the probabilities from multiple models, ensemble methods can often yield better overall performance compared to individual models or raw propensity score-based approaches.

Benefits and drawbacks of raw propensity scores

Using raw predictions from models instead of calibrated scores has its own set of pros and cons. Let's explore them.

Pros of using raw propensity scores:

Simplicity and Speed: Raw predictions are the direct output of the model without any additional post-processing, making them easier and faster to obtain. In real-time applications or scenarios where efficiency is critical, using raw propensity scores can be advantageous.

Suitable for Ranking: Raw propensity scores can still be useful for ranking predictions. Even though they may not directly represent probabilities, they can help rank instances based on their likelihood of being a positive or negative outcome, which is often sufficient in certain applications.

Cons of using raw predictions:

Inconsistent Ranking: Raw propensity scores might not consistently reflect the true likelihood of events occurring. Two instances with similar raw propensity scores might have significantly different actual probabilities, leading to inaccuracies in ranking and decision-making.

Suboptimal Decision Thresholds: Using raw predictions often means relying on fixed decision thresholds (e.g., 0.5 for binary classification), which might not be suitable for all applications. This can lead to suboptimal trade-offs between precision and recall.

Misinterpretation of Scores: When raw propensity scores are not calibrated, users might incorrectly interpret them as reliable probabilities, leading to misguided decisions and a lack of trust in the model's predictions.

In summary, while using raw predictions can be straightforward and computationally efficient, it comes with the trade-off of lacking calibrated probabilities and potential inconsistencies in ranking. In many real-world applications, calibrated scores are preferred as they provide interpretable probabilities, better decision thresholds, and improved trust in the model's predictions.

When should you use propensity score percentiles vs. predicted probabilities?

In the field of machine learning, the choice between using percentiles and probabilities depends on the specific context and the nature of the problem being addressed.

Using propensity score percentiles

Outliers are important: If you are concerned about extreme values or outliers in your data, percentiles can be more robust in representing the distribution. For instance, the 95th percentile captures the value below which 95% of the data lies, making it less sensitive to extreme values.

Data skewness: When dealing with heavily skewed datasets, especially those with long tails, percentiles can be more informative than probabilities. Percentiles provide a clearer understanding of the data's spread and central tendencies, even in the presence of skewed distributions.

Interpretability: Percentiles can be easier to interpret and explain to non-technical stakeholders. For example, stating that a value is at the 75th percentile means that it is higher than 75% of the other values in the dataset.

Using predicted probabilities

Well-calibrated predictions: Probabilities are essential when you need calibrated predictions. If you want reliable estimates of the likelihood of certain events, probabilities are more suitable. Well-calibrated models can provide accurate and informative probabilities for decision-making.

Decision thresholds: In many applications, using probabilities allows you to set appropriate decision thresholds based on your specific needs. For instance, in medical testing, you might want to set a higher threshold to prioritize precision over recall, depending on the consequences of false positives and false negatives.

Uncertainty quantification: Probabilities are crucial for quantifying uncertainty around predictions. Confidence intervals and prediction intervals can be constructed using probabilities, giving a range of likely outcomes.

In summary, use percentiles when you need robust representation of data distribution and are concerned about outliers or skewed data. On the other hand, use probabilities when you require calibrated predictions, decision thresholds, and uncertainty quantification, especially in the context of machine learning models.

Ready to try Faraday? Create a free account.