Tree Pruning in Machine Learning

Last updated: Jun 13, 2026

Author :

Christy Harshitha Dakarapu

In the previous article, we learned how Decision Trees are built by repeatedly splitting data into smaller and purer groups.

A Decision Tree continues creating branches until:

Nodes become pure
Stopping criteria are reached
No further useful splits exist

This flexibility makes Decision Trees powerful.

However, it also creates a major problem:

Overfitting

A tree can become so large that it starts memorizing training data instead of learning general patterns.

Tree Pruning is the solution to this problem.

Pruning removes unnecessary branches from a Decision Tree, making it simpler, faster, and better at generalizing to unseen data.

Why Do Decision Trees Overfit?

Consider a student preparing for an exam.

One student learns concepts.

Another memorizes every question from previous exams.

The second student may perform well on familiar questions but struggle with new ones.

Similarly:


Large Tree
      ↓
Memorizes Training Data
      ↓
Poor Generalization

This is overfitting.

Understanding Overfitting in Trees

Suppose we have:


100 Training Samples

A Decision Tree may continue splitting until:


1 Sample Per Leaf

Example:


Root
  ↓
Node
  ↓
Node
  ↓
Node
  ↓
Leaf

The tree becomes extremely deep.

Training accuracy:


100%

Test accuracy:

Low

Visualizing an Overfitted Tree


Root
 ├── Branch
 │    ├── Branch
 │    │    ├── Branch
 │    │    │    ├── Branch
 │    │    │    │    └── Leaf

Many unnecessary branches exist.

What is Tree Pruning?

Tree Pruning is the process of removing branches that contribute little to predictive performance.

Goal:


Complex Tree
      ↓
Simpler Tree
      ↓
Better Generalization

Pruning reduces model complexity.

Intuition Behind Pruning

Imagine a company decision process.

Original Process:


Question 1
Question 2
Question 3
Question 4
Question 5
Question 6

Many unnecessary questions.

Pruned Process:


Question 1
Question 2
Question 3

Same outcome.

Much simpler.

Pruning applies the same idea to Decision Trees.

Why Pruning Helps

Pruning:

Reduces overfitting
Improves generalization
Simplifies interpretation
Reduces computational cost
Creates more stable models

Example

Before Pruning:


Training Accuracy = 100%

Test Accuracy = 72%

After Pruning:


Training Accuracy = 95%

Test Accuracy = 85%

Slight reduction in training performance.

Major improvement in test performance.

Types of Tree Pruning

Two major approaches:

Pre-Pruning
Post-Pruning

Pre-Pruning

Pre-Pruning stops tree growth before the tree becomes too large.

Idea:


Prevent Overgrowth

Instead of building a huge tree and trimming later,

we stop growth early.

Pre-Pruning Example

Tree Construction:


Root
  ↓
Split
  ↓
Split
  ↓
Stop

Further splits are prevented.

Common Pre-Pruning Criteria

Maximum Depth

Limit tree depth.

Example:


max_depth = 5

Tree cannot grow beyond 5 levels.

Minimum Samples Split

Require minimum samples before splitting.

Example:


min_samples_split = 20

Nodes with fewer than 20 samples cannot split.

Minimum Samples Leaf

Require minimum observations in leaf nodes.

Example:


min_samples_leaf = 10

Every leaf must contain at least 10 samples.

Minimum Information Gain

Allow splits only when they provide meaningful improvement.

Example:


Information Gain > Threshold

Advantages of Pre-Pruning

Faster training
Smaller trees
Lower memory usage
Simple implementation

Limitations of Pre-Pruning

The algorithm may stop too early.

Example:


Potentially Useful Split
      ↓
Never Explored

This can reduce model accuracy.

Post-Pruning

Post-Pruning follows a different strategy.

Workflow:


Build Full Tree
      ↓
Evaluate Branches
      ↓
Remove Weak Branches

The tree grows completely before pruning begins.

Why Post-Pruning Often Works Better

The algorithm first explores all possibilities.

Then it removes unnecessary branches.

This usually produces better trees.

Example

Original Tree:


Root
 ├── Branch A
 ├── Branch B
 └── Branch C
       ├── D
       ├── E
       └── F

After Pruning:


Root
 ├── Branch A
 ├── Branch B
 └── Branch C

Weak branches are removed.

Cost Complexity Pruning

The most common pruning technique in modern Decision Trees.

Used in:


CART

Decision Trees.

Core Idea

Balance:


Accuracy
     vs
Complexity

A slightly less accurate tree may be preferred if it is significantly simpler.

Cost Complexity Formula

The pruning objective can be expressed as:

$R_\alpha(T)=R(T)+\alpha|T|$

Where:

$R(T)$ = Tree error
$|T|$ = Number of leaf nodes
$\alpha$ = Complexity penalty

Larger trees receive larger penalties.

Understanding Alpha (α)

Alpha controls pruning strength.

Small Alpha


Weak Pruning

Larger trees remain.

Large Alpha


Strong Pruning

More branches removed.

Visualizing Pruning

Before:


Large Tree
      ↓
Many Branches

After:


Smaller Tree
      ↓
Essential Branches Only

Pruning and Bias-Variance Tradeoff

Large Tree:


Low Bias
High Variance

Pruned Tree:


Slightly Higher Bias
Lower Variance

Often improves overall performance.

Example: Loan Approval

Features:

Credit Score
Income
Employment Status
Age
Location

Overfitted Tree:


Uses Every Detail

Pruned Tree:


Uses Only Important Factors

Generalizes better.

Example: Disease Prediction

Overfitted Tree:

May memorize rare patient cases.

Pruned Tree:

Focuses on meaningful medical patterns.

Pruning and Interpretability

Small trees are easier to understand.

Example:

Before:


25 Decision Rules

After:


5 Decision Rules

Interpretation becomes much simpler.

Python Example: Pre-Pruning

Limit depth:


from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
    max_depth=5
)

Minimum Samples Split


model = DecisionTreeClassifier(
    min_samples_split=20
)

Minimum Samples Leaf


model = DecisionTreeClassifier(
    min_samples_leaf=10
)

Python Example: Cost Complexity Pruning


model = DecisionTreeClassifier(
    ccp_alpha=0.01
)

Here:


ccp_alpha

controls pruning strength.

Choosing the Right Pruning Level

Typical workflow:


Train Tree
      ↓
Try Multiple Alpha Values
      ↓
Validate Performance
      ↓
Select Best Tree

Cross-validation is commonly used.

Advantages of Tree Pruning

Reduces overfitting
Improves generalization
Simplifies trees
Faster predictions
Better interpretability

Limitations of Tree Pruning

Excessive pruning can cause underfitting
Requires parameter tuning
Optimal pruning level varies by dataset

Real-World Applications

Healthcare

Simpler diagnostic models.

Finance

Interpretable loan approval systems.

Insurance

Risk assessment models.

Marketing

Customer segmentation.

Fraud Detection

Reducing complexity while maintaining accuracy.

Common Mistakes

Growing Unlimited Trees

Often leads to severe overfitting.

Excessive Pruning

May remove useful information.

Using Default Parameters Blindly

Every dataset requires tuning.

Ignoring Validation Performance

Always evaluate on unseen data.

Best Practices

Start with depth constraints
Use cross-validation
Monitor train and test accuracy
Experiment with pruning parameters
Prefer simpler trees when performance is similar

Pre-Pruning vs Post-Pruning

Aspect	Pre-Pruning	Post-Pruning
When Applied	During Growth	After Growth
Training Speed	Faster	Slower
Risk	Underfitting	Lower
Exploration	Limited	Complete
Accuracy	Sometimes Lower	Often Better

Tree Pruning Workflow

Build Decision Tree
Detect overfitting
Apply pruning strategy
Remove weak branches
Evaluate performance
Select optimal complexity
Deploy pruned tree

Why Tree Pruning is Important

Tree Pruning is one of the most important techniques for improving Decision Trees because it directly addresses their biggest weakness: overfitting. By removing unnecessary branches, pruning helps trees focus on meaningful patterns rather than memorizing training data.

A well-pruned tree is often more accurate, more stable, and easier to interpret than a fully grown tree. Understanding pruning is essential because the same principles of controlling model complexity appear throughout Machine Learning, including Random Forests, Gradient Boosting, and Neural Networks.

In the next section, we will move to Ensemble Learning, where we learn how combining multiple trees can dramatically improve predictive performance through techniques such as Random Forests and Boosting.