1. What is Predicting
Probabilities?
- The predict_proba
method estimates the probability
that a given input belongs to each class.
- It returns values in the range [0, 1], representing the
model's confidence as probabilities.
- The sum of predicted probabilities across all classes for
a sample is always 1 (i.e., they form a valid probability distribution).
2. Output Shape of predict_proba
- For binary
classification, the shape of the output is (n_samples, 2):
- Column 0: Probability of the sample belonging to the
negative class.
- Column 1: Probability of the sample belonging to the
positive class.
- For multiclass
classification, the shape is (n_samples, n_classes),
with each column corresponding to the probability of the sample belonging
to that class.
3. Interpretation of
predict_proba Output
- The probability reflects how confidently the model
believes a data point belongs to each class.
- For example, in binary classification:
|
|
|
1 |
0.2 |
0.8 |
2 |
0.9 |
0.1 |
- The model predicts positive class if the positive class
probability is greater than a threshold (default 0.5).
4. Relation to Thresholding and
Classification
- The default
threshold for making classification decisions is 0.5:
- If
predict_proba
for positive class > 0.5, sample is classified as positive. - Otherwise, it is classified as negative.
- You can adjust
this threshold depending on the problem, which affects
false positive and false negative rates.
- Adjusting thresholds can optimize metrics like precision,
recall, F-score, especially on imbalanced datasets.
5. Calibration of Probability
Estimates
- Not all models produce well-calibrated probabilities.
- A calibrated
model outputs probabilities that closely match true
likelihoods.
- Example of a poor calibration: a decision tree grown to
full depth might assign probability 1 or 0, but be often wrong.
- Calibration can be improved using methods like:
- Platt scaling
- Isotonic regression
- Reference: Paper by Niculescu-Mizil and Caruana,
“Predicting Good Probabilities with Supervised Learning”.
6. Examples Using predict_proba
(from the book)
- Using a GradientBoostingClassifier
on toy datasets:
# Suppose gbrt is a trained GradientBoostingClassifier
print("Shape of probabilities:", gbrt.predict_proba(X_test).shape)
# Output:
# Shape of probabilities: (n_samples, 2)
print("Predicted probabilities:\n", gbrt.predict_proba(X_test[:6]))
- Output
shows actual predicted probabilities for each class:
[[0.1 0.9]
[0.8 0.2]
[0.7 0.3]
...
]
- The first column corresponds to the first class
probability, the second column to the second class.
7. Advantages of predict_proba
- Provides interpretable
uncertainty estimates in terms of probabilities.
- Useful for decision
making where probabilistic thresholds are preferable to
hard decisions.
- Can be integrated into pipelines that weigh risks (e.g.,
medical diagnosis, fraud detection).
- Helps in ranking
samples by probability to prioritize further analysis.
8. Relationship Between
predict_proba and decision_function
- Some classifiers implement both
decision_function
andpredict_proba
: decision_function
returns raw scores or margins.predict_proba
converts these scores to probabilities.- Probabilities are usually obtained by applying a logistic
function or softmax on the decision function scores.
- Calibrated models provide better probability estimates
compared to raw scores alone,.
9. Practical Considerations
- When probabilities are needed (e.g., for risk
assessment), prefer models supporting
predict_proba
. - Be cautious that probabilities are only as good as model
calibration.
- Always validate probabilities with calibration plots or
metrics like Brier score.
10. Summary Table
Aspect |
|
Purpose |
Provides
class membership probabilities |
Output
Shape |
Binary:
(n_samples, 2), Multiclass: (n_samples, n_classes) |
Values |
Probabilities
between 0 and 1, sum to 1 per sample |
Default
threshold |
0.5 for
binary classification |
Calibration |
Models
may need calibration for accurate probabilities |
Applications |
Threshold
tuning, risk assessment, ranking predictions |
Relation |
Derived
from decision_function scores via logistic or softmax |
Example
Models |
GradientBoostingClassifier,
Logistic Regression, Random Forest |
Comments
Post a Comment