In the previous article, we learned how Decision Trees are built by repeatedly splitting data into smaller and purer groups.

A Decision Tree continues creating branches until:

  • Nodes become pure
  • Stopping criteria are reached
  • No further useful splits exist

This flexibility makes Decision Trees powerful.

However, it also creates a major problem:

Overfitting

A tree can become so large that it starts memorizing training data instead of learning general patterns.

Tree Pruning is the solution to this problem.

Pruning removes unnecessary branches from a Decision Tree, making it simpler, faster, and better at generalizing to unseen data.

Why Do Decision Trees Overfit?

Consider a student preparing for an exam.

One student learns concepts.

Another memorizes every question from previous exams.

The second student may perform well on familiar questions but struggle with new ones.

Similarly:

Large Tree

Memorizes Training Data

Poor Generalization

This is overfitting.

Understanding Overfitting in Trees

Suppose we have:

100 Training Samples

A Decision Tree may continue splitting until:

1 Sample Per Leaf

Example:

Root

Node

Node

Node

Leaf

The tree becomes extremely deep.

Training accuracy:

100%

Test accuracy:

Low

Visualizing an Overfitted Tree

Root
├── Branch
│ ├── Branch
│ │ ├── Branch
│ │ │ ├── Branch
│ │ │ │ └── Leaf

Many unnecessary branches exist.

What is Tree Pruning?

Tree Pruning is the process of removing branches that contribute little to predictive performance.

Goal:

Complex Tree

Simpler Tree

Better Generalization

Pruning reduces model complexity.

Intuition Behind Pruning

Imagine a company decision process.

Original Process:

Question 1
Question 2
Question 3
Question 4
Question 5
Question 6

Many unnecessary questions.

Pruned Process:

Question 1
Question 2
Question 3

Same outcome.

Much simpler.

Pruning applies the same idea to Decision Trees.

Why Pruning Helps

Pruning:

  • Reduces overfitting
  • Improves generalization
  • Simplifies interpretation
  • Reduces computational cost
  • Creates more stable models

Example

Before Pruning:

Training Accuracy = 100%

Test Accuracy = 72%

After Pruning:

Training Accuracy = 95%

Test Accuracy = 85%

Slight reduction in training performance.

Major improvement in test performance.

Types of Tree Pruning

Two major approaches:

  1. Pre-Pruning
  2. Post-Pruning

Pre-Pruning

Pre-Pruning stops tree growth before the tree becomes too large.

Idea:

Prevent Overgrowth

Instead of building a huge tree and trimming later,

we stop growth early.

Pre-Pruning Example

Tree Construction:

Root

Split

Split

Stop

Further splits are prevented.

Common Pre-Pruning Criteria

Maximum Depth

Limit tree depth.

Example:

max_depth = 5

Tree cannot grow beyond 5 levels.

Minimum Samples Split

Require minimum samples before splitting.

Example:

min_samples_split = 20

Nodes with fewer than 20 samples cannot split.

Minimum Samples Leaf

Require minimum observations in leaf nodes.

Example:

min_samples_leaf = 10

Every leaf must contain at least 10 samples.

Minimum Information Gain

Allow splits only when they provide meaningful improvement.

Example:

Information Gain > Threshold

Advantages of Pre-Pruning

  • Faster training
  • Smaller trees
  • Lower memory usage
  • Simple implementation

Limitations of Pre-Pruning

The algorithm may stop too early.

Example:

Potentially Useful Split

Never Explored

This can reduce model accuracy.

Post-Pruning

Post-Pruning follows a different strategy.

Workflow:

Build Full Tree

Evaluate Branches

Remove Weak Branches

The tree grows completely before pruning begins.

Why Post-Pruning Often Works Better

The algorithm first explores all possibilities.

Then it removes unnecessary branches.

This usually produces better trees.

Example

Original Tree:

Root
├── Branch A
├── Branch B
└── Branch C
├── D
├── E
└── F

After Pruning:

Root
├── Branch A
├── Branch B
└── Branch C

Weak branches are removed.

Cost Complexity Pruning

The most common pruning technique in modern Decision Trees.

Used in:

CART

Decision Trees.

Core Idea

Balance:

Accuracy
vs
Complexity

A slightly less accurate tree may be preferred if it is significantly simpler.

Cost Complexity Formula

The pruning objective can be expressed as:

Rα(T)=R(T)+αTR_\alpha(T)=R(T)+\alpha|T|

Where:

  • R(T)R(T) = Tree error
  • T|T| = Number of leaf nodes
  • α\alpha = Complexity penalty

Larger trees receive larger penalties.

Understanding Alpha (α)

Alpha controls pruning strength.

Small Alpha

Weak Pruning

Larger trees remain.

Large Alpha

Strong Pruning

More branches removed.

Visualizing Pruning

Before:

Large Tree

Many Branches

After:

Smaller Tree

Essential Branches Only

Pruning and Bias-Variance Tradeoff

Large Tree:

Low Bias
High Variance

Pruned Tree:

Slightly Higher Bias
Lower Variance

Often improves overall performance.

Example: Loan Approval

Features:

  • Credit Score
  • Income
  • Employment Status
  • Age
  • Location

Overfitted Tree:

Uses Every Detail

Pruned Tree:

Uses Only Important Factors

Generalizes better.

Example: Disease Prediction

Overfitted Tree:

May memorize rare patient cases.

Pruned Tree:

Focuses on meaningful medical patterns.

Pruning and Interpretability

Small trees are easier to understand.

Example:

Before:

25 Decision Rules

After:

5 Decision Rules

Interpretation becomes much simpler.

Python Example: Pre-Pruning

Limit depth:

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
max_depth=5
)

Minimum Samples Split

model = DecisionTreeClassifier(
min_samples_split=20
)

Minimum Samples Leaf

model = DecisionTreeClassifier(
min_samples_leaf=10
)

Python Example: Cost Complexity Pruning

model = DecisionTreeClassifier(
ccp_alpha=0.01
)

Here:

ccp_alpha

controls pruning strength.

Choosing the Right Pruning Level

Typical workflow:

Train Tree

Try Multiple Alpha Values

Validate Performance

Select Best Tree

Cross-validation is commonly used.

Advantages of Tree Pruning

  • Reduces overfitting
  • Improves generalization
  • Simplifies trees
  • Faster predictions
  • Better interpretability

Limitations of Tree Pruning

  • Excessive pruning can cause underfitting
  • Requires parameter tuning
  • Optimal pruning level varies by dataset

Real-World Applications

Healthcare

Simpler diagnostic models.

Finance

Interpretable loan approval systems.

Insurance

Risk assessment models.

Marketing

Customer segmentation.

Fraud Detection

Reducing complexity while maintaining accuracy.

Common Mistakes

Growing Unlimited Trees

Often leads to severe overfitting.

Excessive Pruning

May remove useful information.

Using Default Parameters Blindly

Every dataset requires tuning.

Ignoring Validation Performance

Always evaluate on unseen data.

Best Practices

  • Start with depth constraints
  • Use cross-validation
  • Monitor train and test accuracy
  • Experiment with pruning parameters
  • Prefer simpler trees when performance is similar

Pre-Pruning vs Post-Pruning

AspectPre-PruningPost-Pruning
When AppliedDuring GrowthAfter Growth
Training SpeedFasterSlower
RiskUnderfittingLower
ExplorationLimitedComplete
AccuracySometimes LowerOften Better

Tree Pruning Workflow

  1. Build Decision Tree
  2. Detect overfitting
  3. Apply pruning strategy
  4. Remove weak branches
  5. Evaluate performance
  6. Select optimal complexity
  7. Deploy pruned tree

Why Tree Pruning is Important

Tree Pruning is one of the most important techniques for improving Decision Trees because it directly addresses their biggest weakness: overfitting. By removing unnecessary branches, pruning helps trees focus on meaningful patterns rather than memorizing training data.

A well-pruned tree is often more accurate, more stable, and easier to interpret than a fully grown tree. Understanding pruning is essential because the same principles of controlling model complexity appear throughout Machine Learning, including Random Forests, Gradient Boosting, and Neural Networks.

In the next section, we will move to Ensemble Learning, where we learn how combining multiple trees can dramatically improve predictive performance through techniques such as Random Forests and Boosting.