In the previous articles, we learned about:
- Entropy
- Information Gain
- Gini Index
These concepts answer an important question:
How does a Decision Tree decide where to split?
Now it's time to put everything together and understand the complete Decision Tree algorithm.
Decision Trees are among the most intuitive Machine Learning algorithms because they mimic human decision-making.
For example, when deciding whether to approve a loan, a bank manager may think:
Is Credit Score High?
↓
Yes / No
If Yes:
↓
Approve
If No:
↓
Check Income
This sequence of decisions naturally forms a tree structure.
Decision Trees work in exactly the same way.
They repeatedly ask questions that divide the data into increasingly pure groups until a prediction can be made.
What is a Decision Tree?
A Decision Tree is a supervised Machine Learning algorithm that makes predictions by recursively splitting data into smaller groups using decision rules.
It resembles an upside-down tree.
Example:
Credit Score?
/ \
High Low
| |
Approve Income?
/ \
High Low
| |
Approve Reject
Each question divides the data.
Why is it Called a Tree?
The structure resembles a tree.
Components:
Root Node
↓
Internal Nodes
↓
Leaf Nodes
Components of a Decision Tree
Root Node
The first split.
Example:
Credit Score?
The root node contains the entire dataset.
Internal Nodes
Intermediate decision points.
Example:
Income?
These nodes further divide data.
Leaf Nodes
Final predictions.
Example:
Approve
Reject
No further splitting occurs.
Example Dataset
Suppose we want to predict loan approval.
| Credit Score | Income | Approved |
|---|---|---|
| High | High | Yes |
| High | Medium | Yes |
| Low | High | Yes |
| Low | Low | No |
Target:
Approved
Possible outputs:
Yes
No
How a Decision Tree Learns
The tree asks:
Which feature best separates the classes?
It evaluates:
- Credit Score
- Income
using:
- Entropy + Information Gain
or - Gini Index
The best feature becomes the root node.
Step 1: Calculate Impurity
Initially:
Dataset:
Yes
Yes
Yes
No
Mixed classes.
Impurity is relatively high.
The tree calculates:
- Entropy
or - Gini Index
Step 2: Try Every Possible Split
Candidate Features:
Credit Score
Income
The algorithm evaluates each feature separately.
Step 3: Choose the Best Split
Suppose:
| Feature | Information Gain |
|---|---|
| Credit Score | 0.55 |
| Income | 0.25 |
Best Feature:
Credit Score
This becomes the root node.
Step 4: Create Child Nodes
Split:
Credit Score
/ \
High Low
Data is divided into smaller groups.
Step 5: Repeat Recursively
For each child node:
Calculate Impurity
↓
Find Best Split
↓
Split Again
The process repeats until stopping criteria are met.
Example Tree Construction
Initial Dataset:
Loan Applications
First Split:
Credit Score?
Result:
Credit Score?
/ \
High Low
For Low Credit Score:
Split Again:
Income?
Result:
Credit Score?
/ \
High Low
| |
Yes Income?
/ \
High Low
| |
Yes No
Tree complete.
Decision Tree Prediction
Suppose:
Applicant:
Credit Score = Low
Income = High
Prediction Path:
Credit Score?
↓
Low
Income?
↓
High
Prediction:
Yes
The tree follows the path from root to leaf.
Why Decision Trees Work
Decision Trees repeatedly reduce impurity.
Goal:
Mixed Data
↓
Pure Groups
Pure groups produce reliable predictions.
Decision Trees for Classification
Classification Trees predict categories.
Examples:
- Spam / Not Spam
- Pass / Fail
- Fraud / Genuine
Output:
Class Labels
Example
Student Dataset:
| Study Hours | Result |
|---|---|
| 2 | Fail |
| 8 | Pass |
Tree:
Study Hours > 5?
/ \
Yes No
| |
Pass Fail
Decision Trees for Regression
Decision Trees can also predict numerical values.
Examples:
- House Prices
- Sales Revenue
- Temperature
Output:
Continuous Numbers
Example
Area > 1500?
/ \
Yes No
| |
75L 45L
Predicted house prices.
Entropy vs Gini in Decision Trees
Decision Trees commonly use:
Entropy
Based on:
Information Gain
Gini
Based on:
Impurity Reduction
Both attempt to create purer nodes.
Comparison
| Criterion | Uses |
|---|---|
| Entropy | ID3, C4.5 |
| Gini | CART |
Stopping Criteria
Without restrictions:
Decision Trees continue splitting until every node becomes pure.
This can cause overfitting.
Common stopping conditions:
- Maximum Depth
- Minimum Samples Split
- Minimum Samples Leaf
- Pure Nodes
Example
Depth = 5
The tree cannot grow beyond five levels.
Decision Tree Visualization
Root
/ \
Node Node
/ \ / \
Leaf Leaf Leaf Leaf
Advantages of Decision Trees
Easy to Understand
Decision rules are human-readable.
Minimal Data Preparation
Often requires less preprocessing.
Handles Numerical and Categorical Features
Both data types are supported.
Feature Importance
Provides insights into influential features.
Works for Classification and Regression
Very flexible algorithm.
Limitations of Decision Trees
Overfitting
Trees can memorize training data.
Instability
Small changes in data may create different trees.
Bias Toward Dominant Classes
Class imbalance can affect splits.
Greedy Learning
Each split is locally optimal, not globally optimal.
Real-World Example: Loan Approval
Features:
- Credit Score
- Income
- Employment Status
Decision Tree:
Credit Score?
↓
Income?
↓
Approval Decision
Real-World Example: Medical Diagnosis
Features:
- Age
- Blood Pressure
- Cholesterol
Tree:
Blood Pressure?
↓
Age?
↓
Diagnosis
Doctors can easily interpret the rules.
Real-World Example: Customer Churn
Features:
- Monthly Charges
- Contract Type
- Tenure
Decision Tree predicts:
Stay
Leave
Python Implementation
Import:
from sklearn.tree import DecisionTreeClassifier
Create Model:
model = DecisionTreeClassifier(
criterion="gini"
)
Train:
model.fit(X_train, y_train)
Predict:
predictions = model.predict(X_test)
Using Entropy
model = DecisionTreeClassifier(
criterion="entropy"
)
Visualizing the Tree
from sklearn.tree import plot_tree
plot_tree(model)
This displays the learned structure.
Feature Importance
Decision Trees automatically estimate feature importance.
Example:
| Feature | Importance |
|---|---|
| Credit Score | 0.55 |
| Income | 0.30 |
| Age | 0.15 |
More important features contribute more to splitting.
Common Applications
Healthcare
Disease prediction.
Finance
Credit risk assessment.
Marketing
Customer segmentation.
E-Commerce
Purchase prediction.
Fraud Detection
Identifying suspicious transactions.
Common Mistakes
Allowing Unlimited Depth
Leads to severe overfitting.
Ignoring Class Imbalance
Can create biased trees.
Assuming Complex Trees Are Better
Larger trees often generalize worse.
Not Validating Performance
Always evaluate on unseen data.
Best Practices
- Limit tree depth
- Use pruning techniques
- Monitor overfitting
- Evaluate feature importance
- Use cross-validation
- Compare with ensemble methods
Decision Tree Workflow
- Start with entire dataset
- Measure impurity
- Evaluate possible splits
- Select best split
- Create child nodes
- Repeat recursively
- Stop when criteria are met
- Make predictions using root-to-leaf paths
Decision Tree Summary
| Component | Purpose |
|---|---|
| Root Node | First Split |
| Internal Node | Intermediate Decision |
| Leaf Node | Final Prediction |
| Entropy/Gini | Measure Impurity |
| Information Gain | Select Best Split |
Why Decision Trees are Important
Decision Trees are one of the most intuitive and interpretable Machine Learning algorithms. They transform complex prediction problems into a series of simple decision rules that humans can easily understand and visualize.
Their ability to handle both classification and regression tasks, work with different types of features, and provide transparent reasoning makes them widely used across industries. At the same time, understanding Decision Trees is essential because they form the foundation of powerful ensemble methods such as Random Forests and Gradient Boosting.
In the next article, we will study Tree Pruning, the technique used to prevent Decision Trees from growing too large and overfitting the training data.