Decision Tree Algorithm in Machine Learning

Last updated: Jun 13, 2026

Author :

Christy Harshitha Dakarapu

In the previous articles, we learned about:

Entropy
Information Gain
Gini Index

These concepts answer an important question:

How does a Decision Tree decide where to split?

Now it's time to put everything together and understand the complete Decision Tree algorithm.

Decision Trees are among the most intuitive Machine Learning algorithms because they mimic human decision-making.

For example, when deciding whether to approve a loan, a bank manager may think:


Is Credit Score High?
          ↓
      Yes / No

If Yes:
      ↓
Approve

If No:
      ↓
Check Income

This sequence of decisions naturally forms a tree structure.

Decision Trees work in exactly the same way.

They repeatedly ask questions that divide the data into increasingly pure groups until a prediction can be made.

What is a Decision Tree?

A Decision Tree is a supervised Machine Learning algorithm that makes predictions by recursively splitting data into smaller groups using decision rules.

It resembles an upside-down tree.

Example:


            Credit Score?
             /       \
           High      Low
            |          |
       Approve      Income?
                    /     \
                 High     Low
                   |        |
               Approve   Reject

Each question divides the data.

Why is it Called a Tree?

The structure resembles a tree.

Components:


Root Node
    ↓
Internal Nodes
    ↓
Leaf Nodes

Components of a Decision Tree

Root Node

The first split.

Example:


Credit Score?

The root node contains the entire dataset.

Internal Nodes

Intermediate decision points.

Example:


Income?

These nodes further divide data.

Leaf Nodes

Final predictions.

Example:


Approve

Reject

No further splitting occurs.

Example Dataset

Suppose we want to predict loan approval.

Credit Score	Income	Approved
High	High	Yes
High	Medium	Yes
Low	High	Yes
Low	Low	No

Target:


Approved

Possible outputs:


Yes

No

How a Decision Tree Learns

The tree asks:


Which feature best separates the classes?

It evaluates:

Credit Score
Income

using:

Entropy + Information Gain
or
Gini Index

The best feature becomes the root node.

Step 1: Calculate Impurity

Initially:

Dataset:


Yes
Yes
Yes
No

Mixed classes.

Impurity is relatively high.

The tree calculates:

Entropy
or
Gini Index

Step 2: Try Every Possible Split

Candidate Features:


Credit Score

Income

The algorithm evaluates each feature separately.

Step 3: Choose the Best Split

Suppose:

Feature	Information Gain
Credit Score	0.55
Income	0.25

Best Feature:


Credit Score

This becomes the root node.

Step 4: Create Child Nodes

Split:


Credit Score
      / \
   High Low

Data is divided into smaller groups.

Step 5: Repeat Recursively

For each child node:


Calculate Impurity
      ↓
Find Best Split
      ↓
Split Again

The process repeats until stopping criteria are met.

Example Tree Construction

Initial Dataset:


Loan Applications

First Split:


Credit Score?

Result:


      Credit Score?
       /        \
    High       Low

For Low Credit Score:

Split Again:


Income?

Result:


      Credit Score?
       /        \
    High       Low
      |          |
    Yes       Income?
              /     \
           High     Low
             |       |
            Yes      No

Tree complete.

Decision Tree Prediction

Suppose:

Applicant:


Credit Score = Low

Income = High

Prediction Path:


Credit Score?
      ↓
Low

Income?
      ↓
High

Prediction:
Yes

The tree follows the path from root to leaf.

Why Decision Trees Work

Decision Trees repeatedly reduce impurity.

Goal:


Mixed Data
      ↓
Pure Groups

Pure groups produce reliable predictions.

Decision Trees for Classification

Classification Trees predict categories.

Examples:

Spam / Not Spam
Pass / Fail
Fraud / Genuine

Output:


Class Labels

Example

Student Dataset:

Study Hours	Result
2	Fail
8	Pass

Tree:


Study Hours > 5?
      /      \
    Yes      No
     |        |
   Pass     Fail

Decision Trees for Regression

Decision Trees can also predict numerical values.

Examples:

House Prices
Sales Revenue
Temperature

Output:


Continuous Numbers

Example


Area > 1500?
      /      \
    Yes      No
     |        |
   75L      45L

Predicted house prices.

Entropy vs Gini in Decision Trees

Decision Trees commonly use:

Entropy

Based on:


Information Gain

Gini

Based on:


Impurity Reduction

Both attempt to create purer nodes.

Comparison

Criterion	Uses
Entropy	ID3, C4.5
Gini	CART

Stopping Criteria

Without restrictions:

Decision Trees continue splitting until every node becomes pure.

This can cause overfitting.

Common stopping conditions:

Maximum Depth
Minimum Samples Split
Minimum Samples Leaf
Pure Nodes

Example


Depth = 5

The tree cannot grow beyond five levels.

Decision Tree Visualization


        Root
       /    \
     Node   Node
     / \     / \
   Leaf Leaf Leaf Leaf

Advantages of Decision Trees

Easy to Understand

Decision rules are human-readable.

Minimal Data Preparation

Often requires less preprocessing.

Handles Numerical and Categorical Features

Both data types are supported.

Feature Importance

Provides insights into influential features.

Works for Classification and Regression

Very flexible algorithm.

Limitations of Decision Trees

Overfitting

Trees can memorize training data.

Instability

Small changes in data may create different trees.

Bias Toward Dominant Classes

Class imbalance can affect splits.

Greedy Learning

Each split is locally optimal, not globally optimal.

Real-World Example: Loan Approval

Features:

Credit Score
Income
Employment Status

Decision Tree:


Credit Score?
      ↓
Income?
      ↓
Approval Decision

Real-World Example: Medical Diagnosis

Features:

Age
Blood Pressure
Cholesterol

Tree:


Blood Pressure?
       ↓
Age?
       ↓
Diagnosis

Doctors can easily interpret the rules.

Real-World Example: Customer Churn

Features:

Monthly Charges
Contract Type
Tenure

Decision Tree predicts:


Stay

Leave

Python Implementation

Import:


from sklearn.tree import DecisionTreeClassifier

Create Model:


model = DecisionTreeClassifier(
    criterion="gini"
)

Train:


model.fit(X_train, y_train)

Predict:


predictions = model.predict(X_test)

Using Entropy


model = DecisionTreeClassifier(
    criterion="entropy"
)

Visualizing the Tree


from sklearn.tree import plot_tree

plot_tree(model)

This displays the learned structure.

Feature Importance

Decision Trees automatically estimate feature importance.

Example:

Feature	Importance
Credit Score	0.55
Income	0.30
Age	0.15

More important features contribute more to splitting.

Common Applications

Healthcare

Disease prediction.

Finance

Credit risk assessment.

Marketing

Customer segmentation.

E-Commerce

Purchase prediction.

Fraud Detection

Identifying suspicious transactions.

Common Mistakes

Allowing Unlimited Depth

Leads to severe overfitting.

Ignoring Class Imbalance

Can create biased trees.

Assuming Complex Trees Are Better

Larger trees often generalize worse.

Not Validating Performance

Always evaluate on unseen data.

Best Practices

Limit tree depth
Use pruning techniques
Monitor overfitting
Evaluate feature importance
Use cross-validation
Compare with ensemble methods

Decision Tree Workflow

Start with entire dataset
Measure impurity
Evaluate possible splits
Select best split
Create child nodes
Repeat recursively
Stop when criteria are met
Make predictions using root-to-leaf paths

Decision Tree Summary

Component	Purpose
Root Node	First Split
Internal Node	Intermediate Decision
Leaf Node	Final Prediction
Entropy/Gini	Measure Impurity
Information Gain	Select Best Split

Why Decision Trees are Important

Decision Trees are one of the most intuitive and interpretable Machine Learning algorithms. They transform complex prediction problems into a series of simple decision rules that humans can easily understand and visualize.

Their ability to handle both classification and regression tasks, work with different types of features, and provide transparent reasoning makes them widely used across industries. At the same time, understanding Decision Trees is essential because they form the foundation of powerful ensemble methods such as Random Forests and Gradient Boosting.

In the next article, we will study Tree Pruning, the technique used to prevent Decision Trees from growing too large and overfitting the training data.