In the previous article, we learned about Bagging (Bootstrap Aggregating), where multiple models are trained on different bootstrap samples and their predictions are combined.

We also discovered that Decision Trees benefit greatly from Bagging because they tend to have high variance.

This naturally leads to one of the most successful Machine Learning algorithms ever created:

Random Forest

Random Forest builds upon Bagging and introduces an additional idea:

Random Feature Selection

This simple improvement makes Random Forest more diverse, more robust, and often more accurate than a standard bagged collection of Decision Trees.

Today, Random Forest is widely used in:

  • Finance
  • Healthcare
  • Fraud Detection
  • Recommendation Systems
  • Customer Analytics

because it provides excellent performance with relatively little tuning.

What is Random Forest?

Random Forest is an ensemble learning algorithm that combines multiple Decision Trees and aggregates their predictions.

The name comes from:

Many Decision Trees

Forest

Instead of relying on a single tree:

One Tree

Prediction

Random Forest uses:

Tree 1
Tree 2
Tree 3
Tree 4
...
Tree N

Combined Prediction

Why Not Use One Decision Tree?

Decision Trees have a major weakness:

High Variance

Small changes in data can produce very different trees.

Example:

Dataset A:

Tree A

Dataset B:

Tree B

Predictions may differ significantly.

Random Forest reduces this instability.

Core Idea Behind Random Forest

Random Forest combines:

Bagging
+
Random Feature Selection

This creates a collection of diverse trees.

Diverse trees make different mistakes.

Combining them improves overall performance.

Step 1: Bootstrap Sampling

Random Forest starts by creating multiple bootstrap datasets.

Original Dataset:

A B C D E

Bootstrap Sample 1:

A C C D E

Bootstrap Sample 2:

B B C E D

Bootstrap Sample 3:

A A D E C

Each tree receives a different dataset.

Step 2: Train Multiple Trees

Each bootstrap dataset trains a separate Decision Tree.

Workflow:

Dataset 1 → Tree 1

Dataset 2 → Tree 2

Dataset 3 → Tree 3

So far, this is standard Bagging.

Step 3: Random Feature Selection

This is the key innovation.

Suppose we have:

Age
Salary
Experience
Education
Credit Score

Five features.

When building a split:

A normal Decision Tree considers:

All Features

Random Forest considers:

Random Subset

Example:

Salary
Education

Only these features compete for the split.

Why Random Feature Selection Helps

Without randomness:

Many trees choose the same feature repeatedly.

Example:

Credit Score

becomes the root node in every tree.

Trees become highly similar.

Random feature selection forces diversity.

Example

Tree 1:

Root = Credit Score

Tree 2:

Root = Salary

Tree 3:

Root = Education

Trees become less correlated.

Step 4: Prediction

Each tree generates a prediction.

Example:

Tree 1 → Fraud

Tree 2 → Fraud

Tree 3 → Genuine

Tree 4 → Fraud

Tree 5 → Fraud

Majority Voting

Votes:

Fraud = 4

Genuine = 1

Final Prediction:

Fraud

Regression in Random Forest

For regression:

Predictions are averaged.

Example:

Tree 1 → 45

Tree 2 → 50

Tree 3 → 55

Prediction:

45+50+553=50\frac{45+50+55}{3} = 50

Random Forest Workflow

Original Dataset

Bootstrap Samples

Multiple Trees

Random Features

Predictions

Voting / Averaging

Final Output

Why Random Forest Works

Each tree sees:

  • Different data
  • Different features

Therefore:

Different Errors

Combining predictions reduces overall error.

Variance Reduction

Single Tree:

High Variance

Random Forest:

Lower Variance

because many trees are averaged together.

Example: Exam Prediction

Single Tree:

Accuracy = 78%

Random Forest:

Accuracy = 88%

Improvement occurs because multiple trees cooperate.

Out-of-Bag (OOB) Samples

Remember:

Bootstrap sampling leaves out some observations.

Example:

Original:

A B C D E

Bootstrap Sample:

A C C D E

Sample:

B

is excluded.

This becomes an:

Out-of-Bag Sample

OOB Evaluation

Out-of-Bag samples can estimate model performance.

Benefits:

  • No separate validation set required
  • Efficient evaluation
  • Built into Random Forest

Feature Importance

One major advantage of Random Forest:

Feature Importance

The algorithm estimates how useful each feature is.

Example:

FeatureImportance
Credit Score0.42
Income0.30
Age0.18
Location0.10

Higher importance means greater influence.

Classification Example

Predict:

Spam

Not Spam

Trees vote.

Majority class wins.

Regression Example

Predict:

House Price

Trees estimate prices.

Average prediction becomes final output.

Advantages of Random Forest

High Accuracy

Often performs well without extensive tuning.

Reduced Overfitting

More robust than a single Decision Tree.

Handles Non-Linear Relationships

Captures complex patterns.

Feature Importance

Provides useful insights.

Works with Large Datasets

Scales reasonably well.

Handles Missing Values Better

More tolerant than many algorithms.

Limitations of Random Forest

Reduced Interpretability

A forest of hundreds of trees is difficult to explain.

Increased Computational Cost

Training many trees requires more resources.

Larger Memory Usage

Many trees must be stored.

Slower Predictions

Compared to a single tree.

Random Forest vs Decision Tree

Decision TreeRandom Forest
Single TreeMany Trees
High VarianceLower Variance
Easier to InterpretHarder to Interpret
FasterSlower
More Overfitting RiskLess Overfitting Risk

Random Forest vs Bagging

BaggingRandom Forest
Bootstrap SamplingBootstrap Sampling
Multiple TreesMultiple Trees
Uses All FeaturesUses Random Features
Less DiversityMore Diversity

Random Forest is essentially an improved version of Bagging.

Choosing Number of Trees

Parameter:

n_estimators

Example:

100 Trees

More trees generally improve stability but increase computation.

Choosing Maximum Depth

Parameter:

max_depth

Controls tree complexity.

Helps prevent overfitting.

Python Implementation

Import:

from sklearn.ensemble import RandomForestClassifier

Create Model:

model = RandomForestClassifier(
n_estimators=100,
random_state=42
)

Train:

model.fit(X_train, y_train)

Predict:

predictions = model.predict(X_test)

Out-of-Bag Evaluation

model = RandomForestClassifier(
n_estimators=100,
oob_score=True
)

View Score:

print(model.oob_score_)

Feature Importance

print(model.feature_importances_)

Real-World Applications

Healthcare

Disease diagnosis.

Finance

Credit scoring.

Fraud Detection

Transaction monitoring.

E-Commerce

Purchase prediction.

Marketing

Customer churn prediction.

Manufacturing

Equipment failure prediction.

Common Mistakes

Using Too Few Trees

Performance may become unstable.

Ignoring Hyperparameter Tuning

Depth and tree count matter.

Assuming Feature Importance Means Causation

Importance indicates usefulness, not causality.

Using Random Forest When Explainability is Critical

Single trees may be preferable.

Best Practices

  • Use sufficient trees
  • Monitor OOB score
  • Tune maximum depth
  • Analyze feature importance
  • Validate performance on unseen data

Random Forest Summary

ComponentPurpose
Bootstrap SamplingDataset Diversity
Multiple TreesReduce Variance
Random FeaturesTree Diversity
VotingClassification
AveragingRegression
OOB SamplesValidation

Random Forest Workflow Summary

  1. Create bootstrap samples
  2. Train multiple trees
  3. Randomly select features at each split
  4. Generate predictions
  5. Vote or average
  6. Produce final prediction
  7. Evaluate performance

Why Random Forest is Important

Random Forest is one of the most widely used Machine Learning algorithms because it combines the simplicity of Decision Trees with the power of ensemble learning. By training many diverse trees and aggregating their predictions, it achieves strong performance while reducing overfitting and improving robustness.

Its ability to handle classification and regression tasks, provide feature importance estimates, and perform well with minimal tuning has made it a standard tool in both industry and research.

In the next article, we will study Boosting Intuition, a fundamentally different ensemble technique where models are trained sequentially and each new model focuses on correcting the mistakes made by previous models.