1. What is the Decision Function?
- The decision_function
method is provided by many classifiers in scikit-learn.
- It returns a continuous
score for each sample, representing the classifier’s
confidence or margin.
- This score reflects how strongly the model favors one class over another
in binary classification, or a more complex set of scores in multiclass
classification.
2. Shape and Output of
decision_function
- For binary
classification, the output shape is (n_samples,).
- Each value is a floating-point number indicating the degree
to which the sample belongs to the positive class.
- Positive values indicate a preference for the positive
class; negative values indicate a preference for the negative class.
- For multiclass
classification, the output is usually a 2D array of shape
(n_samples, n_classes), providing scores for each class.
3. Interpretation of
decision_function Scores
- The sign of the value (positive or negative) determines
the predicted class.
- The magnitude
represents the confidence or "distance" from the decision boundary.
- The larger the absolute value, the more confident the
model is in its classification.
Example:
print("Decision function values:\n", classifier.decision_function(X_test)[:6])
# Outputs something like:
# [4.5, -1.2, 0.3, 5.0, -3.1, ...]
- Here, values like 4.5 or 5.0 indicate strong confidence
in the positive class; -1.2 or -3.1 indicate strong preference for the
negative class.
4. Relationship to Prediction
Threshold
- For binary
classifiers, prediction is derived by thresholding:
- Predicted class = positive if decision_function score
> 0.
- Predicted class = negative otherwise.
- This threshold can be adjusted:
- Changing threshold impacts false positives/negatives.
- Adjusting threshold can improve metrics like precision
and recall in imbalanced data.
5. Examples of Classifiers Using
decision_function
- Support Vector Machines (SVMs) use decision_function to
provide margin distances from the decision boundary.
- GradientBoostingClassifier also provides
decision_function for more granular confidence.
- Logistic regression usually does not provide
decision_function but provides predict_proba instead (log odds can be
considered similar).
6. Advantages of
decision_function Over predict_proba
- decision_function outputs raw scores, which might be more
informative for some models.
- These raw scores can be transformed into probabilities
with calibration methods like Platt
scaling.
- For models like SVMs, predict_proba is a wrapper over
decision_function with a calibration step.
- Users can set custom thresholds on decision_function to
better control classification decisions.
7. Use in Model Evaluation
- decision_function outputs enable construction of ROC curves, which plot
True Positive Rate vs False Positive Rate at different thresholds.
- By varying the decision threshold, you can evaluate model
performance across thresholds.
- Thus, decision_function is crucial for comprehensive
model assessment beyond accuracy.
8. Example Code Snippet (from the
book)
from sklearn.ensemble import GradientBoostingClassifier
# Suppose we have a trained GradientBoostingClassifier called gbrt
print("X_test.shape:", X_test.shape)
print("Decision function shape:", gbrt.decision_function(X_test).shape)
print("Decision function:\n", gbrt.decision_function(X_test)[:6])
Output might be:
X_test.shape: (25, 2)
Decision function shape: (25,)
Decision function:
[4. 2.5 1.3 0.7 -1.2 -3.4]
Explanation: These values show
the strength of model preference for the positive class.
9. Summary Points
Aspect |
|
Purpose |
Measures
confidence or margin in classification |
Output
(Binary) |
Array
of floats (n_samples,) indicating class preference |
Output
(Multiclass) |
Array
of floats (n_samples, n_classes) with scores per class |
Interpretation |
Positive
= positive class, Negative = negative class; magnitude = confidence |
Thresholding |
Default
threshold at 0 to convert to class labels |
Usage |
Enables
custom thresholds, ROC analysis, model calibration |
Example
models |
SVM, Gradient
Boosting, some ensemble classifiers |
Comments
Post a Comment