In the previous article, we learned how Decision Trees are built by repeatedly splitting data into smaller and purer groups.
A Decision Tree continues creating branches until:
- Nodes become pure
- Stopping criteria are reached
- No further useful splits exist
This flexibility makes Decision Trees powerful.
However, it also creates a major problem:
Overfitting
A tree can become so large that it starts memorizing training data instead of learning general patterns.
Tree Pruning is the solution to this problem.
Pruning removes unnecessary branches from a Decision Tree, making it simpler, faster, and better at generalizing to unseen data.
Why Do Decision Trees Overfit?
Consider a student preparing for an exam.
One student learns concepts.
Another memorizes every question from previous exams.
The second student may perform well on familiar questions but struggle with new ones.
Similarly:
Large Tree
↓
Memorizes Training Data
↓
Poor Generalization
This is overfitting.
Understanding Overfitting in Trees
Suppose we have:
100 Training Samples
A Decision Tree may continue splitting until:
1 Sample Per Leaf
Example:
Root
↓
Node
↓
Node
↓
Node
↓
Leaf
The tree becomes extremely deep.
Training accuracy:
100%
Test accuracy:
Low
Visualizing an Overfitted Tree
Root
├── Branch
│ ├── Branch
│ │ ├── Branch
│ │ │ ├── Branch
│ │ │ │ └── Leaf
Many unnecessary branches exist.
What is Tree Pruning?
Tree Pruning is the process of removing branches that contribute little to predictive performance.
Goal:
Complex Tree
↓
Simpler Tree
↓
Better Generalization
Pruning reduces model complexity.
Intuition Behind Pruning
Imagine a company decision process.
Original Process:
Question 1
Question 2
Question 3
Question 4
Question 5
Question 6
Many unnecessary questions.
Pruned Process:
Question 1
Question 2
Question 3
Same outcome.
Much simpler.
Pruning applies the same idea to Decision Trees.
Why Pruning Helps
Pruning:
- Reduces overfitting
- Improves generalization
- Simplifies interpretation
- Reduces computational cost
- Creates more stable models
Example
Before Pruning:
Training Accuracy = 100%
Test Accuracy = 72%
After Pruning:
Training Accuracy = 95%
Test Accuracy = 85%
Slight reduction in training performance.
Major improvement in test performance.
Types of Tree Pruning
Two major approaches:
- Pre-Pruning
- Post-Pruning
Pre-Pruning
Pre-Pruning stops tree growth before the tree becomes too large.
Idea:
Prevent Overgrowth
Instead of building a huge tree and trimming later,
we stop growth early.
Pre-Pruning Example
Tree Construction:
Root
↓
Split
↓
Split
↓
Stop
Further splits are prevented.
Common Pre-Pruning Criteria
Maximum Depth
Limit tree depth.
Example:
max_depth = 5
Tree cannot grow beyond 5 levels.
Minimum Samples Split
Require minimum samples before splitting.
Example:
min_samples_split = 20
Nodes with fewer than 20 samples cannot split.
Minimum Samples Leaf
Require minimum observations in leaf nodes.
Example:
min_samples_leaf = 10
Every leaf must contain at least 10 samples.
Minimum Information Gain
Allow splits only when they provide meaningful improvement.
Example:
Information Gain > Threshold
Advantages of Pre-Pruning
- Faster training
- Smaller trees
- Lower memory usage
- Simple implementation
Limitations of Pre-Pruning
The algorithm may stop too early.
Example:
Potentially Useful Split
↓
Never Explored
This can reduce model accuracy.
Post-Pruning
Post-Pruning follows a different strategy.
Workflow:
Build Full Tree
↓
Evaluate Branches
↓
Remove Weak Branches
The tree grows completely before pruning begins.
Why Post-Pruning Often Works Better
The algorithm first explores all possibilities.
Then it removes unnecessary branches.
This usually produces better trees.
Example
Original Tree:
Root
├── Branch A
├── Branch B
└── Branch C
├── D
├── E
└── F
After Pruning:
Root
├── Branch A
├── Branch B
└── Branch C
Weak branches are removed.
Cost Complexity Pruning
The most common pruning technique in modern Decision Trees.
Used in:
CART
Decision Trees.
Core Idea
Balance:
Accuracy
vs
Complexity
A slightly less accurate tree may be preferred if it is significantly simpler.
Cost Complexity Formula
The pruning objective can be expressed as:
Where:
- = Tree error
- = Number of leaf nodes
- = Complexity penalty
Larger trees receive larger penalties.
Understanding Alpha (α)
Alpha controls pruning strength.
Small Alpha
Weak Pruning
Larger trees remain.
Large Alpha
Strong Pruning
More branches removed.
Visualizing Pruning
Before:
Large Tree
↓
Many Branches
After:
Smaller Tree
↓
Essential Branches Only
Pruning and Bias-Variance Tradeoff
Large Tree:
Low Bias
High Variance
Pruned Tree:
Slightly Higher Bias
Lower Variance
Often improves overall performance.
Example: Loan Approval
Features:
- Credit Score
- Income
- Employment Status
- Age
- Location
Overfitted Tree:
Uses Every Detail
Pruned Tree:
Uses Only Important Factors
Generalizes better.
Example: Disease Prediction
Overfitted Tree:
May memorize rare patient cases.
Pruned Tree:
Focuses on meaningful medical patterns.
Pruning and Interpretability
Small trees are easier to understand.
Example:
Before:
25 Decision Rules
After:
5 Decision Rules
Interpretation becomes much simpler.
Python Example: Pre-Pruning
Limit depth:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(
max_depth=5
)
Minimum Samples Split
model = DecisionTreeClassifier(
min_samples_split=20
)
Minimum Samples Leaf
model = DecisionTreeClassifier(
min_samples_leaf=10
)
Python Example: Cost Complexity Pruning
model = DecisionTreeClassifier(
ccp_alpha=0.01
)
Here:
ccp_alpha
controls pruning strength.
Choosing the Right Pruning Level
Typical workflow:
Train Tree
↓
Try Multiple Alpha Values
↓
Validate Performance
↓
Select Best Tree
Cross-validation is commonly used.
Advantages of Tree Pruning
- Reduces overfitting
- Improves generalization
- Simplifies trees
- Faster predictions
- Better interpretability
Limitations of Tree Pruning
- Excessive pruning can cause underfitting
- Requires parameter tuning
- Optimal pruning level varies by dataset
Real-World Applications
Healthcare
Simpler diagnostic models.
Finance
Interpretable loan approval systems.
Insurance
Risk assessment models.
Marketing
Customer segmentation.
Fraud Detection
Reducing complexity while maintaining accuracy.
Common Mistakes
Growing Unlimited Trees
Often leads to severe overfitting.
Excessive Pruning
May remove useful information.
Using Default Parameters Blindly
Every dataset requires tuning.
Ignoring Validation Performance
Always evaluate on unseen data.
Best Practices
- Start with depth constraints
- Use cross-validation
- Monitor train and test accuracy
- Experiment with pruning parameters
- Prefer simpler trees when performance is similar
Pre-Pruning vs Post-Pruning
| Aspect | Pre-Pruning | Post-Pruning |
|---|---|---|
| When Applied | During Growth | After Growth |
| Training Speed | Faster | Slower |
| Risk | Underfitting | Lower |
| Exploration | Limited | Complete |
| Accuracy | Sometimes Lower | Often Better |
Tree Pruning Workflow
- Build Decision Tree
- Detect overfitting
- Apply pruning strategy
- Remove weak branches
- Evaluate performance
- Select optimal complexity
- Deploy pruned tree
Why Tree Pruning is Important
Tree Pruning is one of the most important techniques for improving Decision Trees because it directly addresses their biggest weakness: overfitting. By removing unnecessary branches, pruning helps trees focus on meaningful patterns rather than memorizing training data.
A well-pruned tree is often more accurate, more stable, and easier to interpret than a fully grown tree. Understanding pruning is essential because the same principles of controlling model complexity appear throughout Machine Learning, including Random Forests, Gradient Boosting, and Neural Networks.
In the next section, we will move to Ensemble Learning, where we learn how combining multiple trees can dramatically improve predictive performance through techniques such as Random Forests and Boosting.