In the previous articles, we learned about:

  • Entropy
  • Information Gain
  • Gini Index

These concepts answer an important question:

How does a Decision Tree decide where to split?

Now it's time to put everything together and understand the complete Decision Tree algorithm.

Decision Trees are among the most intuitive Machine Learning algorithms because they mimic human decision-making.

For example, when deciding whether to approve a loan, a bank manager may think:

Is Credit Score High?

Yes / No

If Yes:

Approve

If No:

Check Income

This sequence of decisions naturally forms a tree structure.

Decision Trees work in exactly the same way.

They repeatedly ask questions that divide the data into increasingly pure groups until a prediction can be made.

What is a Decision Tree?

A Decision Tree is a supervised Machine Learning algorithm that makes predictions by recursively splitting data into smaller groups using decision rules.

It resembles an upside-down tree.

Example:

            Credit Score?
/ \
High Low
| |
Approve Income?
/ \
High Low
| |
Approve Reject

Each question divides the data.

Why is it Called a Tree?

The structure resembles a tree.

Components:

Root Node

Internal Nodes

Leaf Nodes

Components of a Decision Tree

Root Node

The first split.

Example:

Credit Score?

The root node contains the entire dataset.

Internal Nodes

Intermediate decision points.

Example:

Income?

These nodes further divide data.

Leaf Nodes

Final predictions.

Example:

Approve

Reject

No further splitting occurs.

Example Dataset

Suppose we want to predict loan approval.

Credit ScoreIncomeApproved
HighHighYes
HighMediumYes
LowHighYes
LowLowNo

Target:

Approved

Possible outputs:

Yes

No

How a Decision Tree Learns

The tree asks:

Which feature best separates the classes?

It evaluates:

  • Credit Score
  • Income

using:

  • Entropy + Information Gain
    or
  • Gini Index

The best feature becomes the root node.

Step 1: Calculate Impurity

Initially:

Dataset:

Yes
Yes
Yes
No

Mixed classes.

Impurity is relatively high.

The tree calculates:

  • Entropy
    or
  • Gini Index

Step 2: Try Every Possible Split

Candidate Features:

Credit Score

Income

The algorithm evaluates each feature separately.

Step 3: Choose the Best Split

Suppose:

FeatureInformation Gain
Credit Score0.55
Income0.25

Best Feature:

Credit Score

This becomes the root node.

Step 4: Create Child Nodes

Split:

Credit Score
/ \
High Low

Data is divided into smaller groups.

Step 5: Repeat Recursively

For each child node:

Calculate Impurity

Find Best Split

Split Again

The process repeats until stopping criteria are met.

Example Tree Construction

Initial Dataset:

Loan Applications

First Split:

Credit Score?

Result:

      Credit Score?
/ \
High Low

For Low Credit Score:

Split Again:

Income?

Result:

      Credit Score?
/ \
High Low
| |
Yes Income?
/ \
High Low
| |
Yes No

Tree complete.

Decision Tree Prediction

Suppose:

Applicant:

Credit Score = Low

Income = High

Prediction Path:

Credit Score?

Low

Income?

High

Prediction:
Yes

The tree follows the path from root to leaf.

Why Decision Trees Work

Decision Trees repeatedly reduce impurity.

Goal:

Mixed Data

Pure Groups

Pure groups produce reliable predictions.

Decision Trees for Classification

Classification Trees predict categories.

Examples:

  • Spam / Not Spam
  • Pass / Fail
  • Fraud / Genuine

Output:

Class Labels

Example

Student Dataset:

Study HoursResult
2Fail
8Pass

Tree:

Study Hours > 5?
/ \
Yes No
| |
Pass Fail

Decision Trees for Regression

Decision Trees can also predict numerical values.

Examples:

  • House Prices
  • Sales Revenue
  • Temperature

Output:

Continuous Numbers

Example

Area > 1500?
/ \
Yes No
| |
75L 45L

Predicted house prices.

Entropy vs Gini in Decision Trees

Decision Trees commonly use:

Entropy

Based on:

Information Gain

Gini

Based on:

Impurity Reduction

Both attempt to create purer nodes.

Comparison

CriterionUses
EntropyID3, C4.5
GiniCART

Stopping Criteria

Without restrictions:

Decision Trees continue splitting until every node becomes pure.

This can cause overfitting.

Common stopping conditions:

  • Maximum Depth
  • Minimum Samples Split
  • Minimum Samples Leaf
  • Pure Nodes

Example

Depth = 5

The tree cannot grow beyond five levels.

Decision Tree Visualization

        Root
/ \
Node Node
/ \ / \
Leaf Leaf Leaf Leaf

Advantages of Decision Trees

Easy to Understand

Decision rules are human-readable.

Minimal Data Preparation

Often requires less preprocessing.

Handles Numerical and Categorical Features

Both data types are supported.

Feature Importance

Provides insights into influential features.

Works for Classification and Regression

Very flexible algorithm.

Limitations of Decision Trees

Overfitting

Trees can memorize training data.

Instability

Small changes in data may create different trees.

Bias Toward Dominant Classes

Class imbalance can affect splits.

Greedy Learning

Each split is locally optimal, not globally optimal.

Real-World Example: Loan Approval

Features:

  • Credit Score
  • Income
  • Employment Status

Decision Tree:

Credit Score?

Income?

Approval Decision

Real-World Example: Medical Diagnosis

Features:

  • Age
  • Blood Pressure
  • Cholesterol

Tree:

Blood Pressure?

Age?

Diagnosis

Doctors can easily interpret the rules.

Real-World Example: Customer Churn

Features:

  • Monthly Charges
  • Contract Type
  • Tenure

Decision Tree predicts:

Stay

Leave

Python Implementation

Import:

from sklearn.tree import DecisionTreeClassifier

Create Model:

model = DecisionTreeClassifier(
criterion="gini"
)

Train:

model.fit(X_train, y_train)

Predict:

predictions = model.predict(X_test)

Using Entropy

model = DecisionTreeClassifier(
criterion="entropy"
)

Visualizing the Tree

from sklearn.tree import plot_tree

plot_tree(model)

This displays the learned structure.

Feature Importance

Decision Trees automatically estimate feature importance.

Example:

FeatureImportance
Credit Score0.55
Income0.30
Age0.15

More important features contribute more to splitting.

Common Applications

Healthcare

Disease prediction.

Finance

Credit risk assessment.

Marketing

Customer segmentation.

E-Commerce

Purchase prediction.

Fraud Detection

Identifying suspicious transactions.

Common Mistakes

Allowing Unlimited Depth

Leads to severe overfitting.

Ignoring Class Imbalance

Can create biased trees.

Assuming Complex Trees Are Better

Larger trees often generalize worse.

Not Validating Performance

Always evaluate on unseen data.

Best Practices

  • Limit tree depth
  • Use pruning techniques
  • Monitor overfitting
  • Evaluate feature importance
  • Use cross-validation
  • Compare with ensemble methods

Decision Tree Workflow

  1. Start with entire dataset
  2. Measure impurity
  3. Evaluate possible splits
  4. Select best split
  5. Create child nodes
  6. Repeat recursively
  7. Stop when criteria are met
  8. Make predictions using root-to-leaf paths

Decision Tree Summary

ComponentPurpose
Root NodeFirst Split
Internal NodeIntermediate Decision
Leaf NodeFinal Prediction
Entropy/GiniMeasure Impurity
Information GainSelect Best Split

Why Decision Trees are Important

Decision Trees are one of the most intuitive and interpretable Machine Learning algorithms. They transform complex prediction problems into a series of simple decision rules that humans can easily understand and visualize.

Their ability to handle both classification and regression tasks, work with different types of features, and provide transparent reasoning makes them widely used across industries. At the same time, understanding Decision Trees is essential because they form the foundation of powerful ensemble methods such as Random Forests and Gradient Boosting.

In the next article, we will study Tree Pruning, the technique used to prevent Decision Trees from growing too large and overfitting the training data.