1. What are Decision Trees?
Decision trees are supervised learning
models used for classification and regression tasks.
- They model decisions as a tree structure, where each internal node
corresponds to a decision (usually a test on a feature), and each leaf
node corresponds to an output label or value.
- Essentially, the tree learns a hierarchy of if/else questions that
partition the input space into regions associated with specific outputs.
2. How Decision Trees Work
- The model splits the dataset based on feature values in a
way that increases the purity
of the partitions (i.e., groups that are more homogeneous with respect to
the target).
- At each node, the algorithm evaluates possible splits on
features and selects the one that best separates the data, according to a
criterion such as Gini
impurity, entropy
(information gain), or mean
squared error (for regression).
- The process recursively continues splitting subsets until
a stopping criterion is met (e.g., maximum depth, minimum samples per
leaf).
Example analogy from the book:
·
To distinguish animals like bears, hawks,
penguins, and dolphins, decision trees ask questions like “Does the animal have
feathers?” to split the dataset into smaller groups, continuing with further
specific questions.
·
Such questions form a tree structure where
navigating from the root to a leaf corresponds to a series of questions and
answers, leading to a classification decision,.
3. Advantages of Decision Trees
- Easy to understand and visualize: The
flow of decisions can be depicted as a tree, which is interpretable even
for non-experts (especially for small trees).
- No need for feature scaling:
Decision trees are invariant to scaling or normalization since splits are
based on thresholds on feature values and not on distances.
- Handles both numerical and categorical
data: Trees can work with a mix of continuous,
ordinal, and categorical features without special preprocessing.
- Automatic feature selection:
Only relevant features are used for splits, providing a form of feature
selection.
4. Weaknesses of Decision Trees
- Tendency to overfit:
Decision trees can create very complex trees fitting the noise in training
data, leading to poor generalization performance.
- Unstable:
Small variations in data can lead to very different trees.
- Greedy splits:
Recursive partitioning is greedy and locally optimal but not guaranteed to
find the best overall tree.
Due to these issues, single
decision trees are often outperformed by ensemble methods like random forests
and gradient-boosted trees,.
5. Parameters and Tuning
Key parameters controlling
decision tree construction:
- max_depth:
Maximum depth of the tree. Limiting depth controls overfitting.
- min_samples_split:
Minimum number of samples required to split a node.
- min_samples_leaf:
Minimum number of samples required to be at a leaf node.
- max_features: The
number of features to consider when looking for the best split.
- criterion: The
function to measure split quality, e.g.
"gini"
or"entropy"
for classification,"mse"
for regression.
Proper tuning of these parameters
helps optimize the balance between underfitting and overfitting.
6. Extensions: Ensembles of Decision
Trees
To overcome the limitations of
single trees, ensemble methods combine multiple trees for better performance
and stability:
- Random Forests:
Build many decision trees on bootstrap samples of data and average the
results, injecting randomness by limiting features for splits to reduce
overfitting.
- Gradient Boosted Decision Trees:
Sequentially build trees that correct errors of previous ones, resulting
in often more accurate but slower-to-train models.
Both approaches maintain some
advantages of trees (e.g., no need for scaling, interpretability of base
learners) while significantly enhancing performance.
7. Visualization of Decision
Trees
- Because the model structure corresponds directly to human-understandable decisions, decision trees can be visualized as flowcharts.
- Visualization aids in understanding model decisions and
debugging.
8. Summary
|
|
Model
Type |
Hierarchical
if/else decision rules forming a tree |
Tasks |
Classification
and regression |
Strengths |
Interpretable,
no scaling needed, handles mixed data |
Weaknesses |
Prone
to overfitting, unstable with small changes |
Key
Parameters |
max_depth,
min_samples_split, criterion, max_features |
Use in
Ensembles |
Building
block for robust models like Random Forests and Gradient Boosted Trees |
Comments
Post a Comment