In the previous article, we learned about Ensemble Learning, where multiple models work together to produce better predictions than a single model.

One of the earliest and most successful ensemble techniques is:

Bagging

Bagging is the technique behind one of the most popular Machine Learning algorithms:

Random Forest

The core idea of Bagging is surprisingly simple:

Instead of training one model,

train many models on slightly different versions of the dataset and combine their predictions.

This approach often produces more stable, accurate, and robust models.

What is Bagging?

Bagging stands for:

Bootstrap Aggregating

The name comes from two concepts:

Bootstrap
+
Aggregating

Bootstrap

Create multiple datasets by sampling from the original dataset.

Aggregating

Combine predictions from multiple models.

Together:

Multiple Datasets

Multiple Models

Combined Prediction

Why Do We Need Bagging?

Suppose we train a single Decision Tree.

Decision Trees are powerful but unstable.

A small change in training data may produce a very different tree.

Example:

Dataset Version 1:

Tree A

Dataset Version 2:

Tree B

Predictions may vary significantly.

This variability is called:

High Variance

Bagging helps reduce variance.

Intuition Behind Bagging

Imagine asking one student to estimate:

Number of candies in a jar

The estimate may be inaccurate.

Now ask:

100 Students

Average their answers.

The combined estimate is often much closer to reality.

Bagging uses the same idea.

Understanding Bootstrap Sampling

Bootstrap sampling is the foundation of Bagging.

Suppose we have a dataset:

Sample
A
B
C
D
E

Bootstrap sampling creates new datasets by randomly selecting samples:

A
C
C
D
E

Notice:

C appears twice
B is missing

This is allowed.

Sampling occurs:

With Replacement

What Does "With Replacement" Mean?

After selecting a sample,

it is returned to the dataset before the next selection.

Example:

Original Dataset:

A B C D E

Pick:

C

Return C.

Dataset remains:

A B C D E

C can be selected again.

Multiple Bootstrap Datasets

Original Dataset:

A B C D E

Bootstrap Dataset 1:

A C C D E

Bootstrap Dataset 2:

B B C E D

Bootstrap Dataset 3:

A A D E C

Each dataset is slightly different.

Training Multiple Models

Each bootstrap dataset trains a separate model.

Workflow:

Dataset 1 → Model 1

Dataset 2 → Model 2

Dataset 3 → Model 3

Dataset 4 → Model 4

Each model learns different patterns.

Prediction Phase

After training:

Each model makes a prediction.

Example:

Model 1 → Spam

Model 2 → Spam

Model 3 → Not Spam

Model 4 → Spam

Model 5 → Spam

Majority Voting

Classification problems use voting.

Votes:

Spam = 4

Not Spam = 1

Final Prediction:

Spam

Averaging for Regression

For regression tasks:

Predictions are averaged.

Example:

Model 1 → 45

Model 2 → 50

Model 3 → 55

Prediction:

45+50+553=50\frac{45+50+55}{3} = 50

Final Output:

50

Bagging Workflow

Original Dataset

Bootstrap Samples

Train Multiple Models

Generate Predictions

Vote / Average

Final Prediction

Why Bagging Works

Different models make different mistakes.

Example:

Model 1 Wrong

Model 2 Correct

Model 3 Correct

Majority voting still produces the correct answer.

Errors tend to cancel out.

Understanding Variance Reduction

Single Decision Tree:

Prediction Changes Easily

High variance.

Bagged Trees:

Average Many Trees

More stable predictions.

Bias vs Variance

Bagging primarily reduces:

Variance

It does not significantly reduce bias.

Visualizing Variance Reduction

Single Tree:

/\/\/\/\/\/\

Very sensitive.

Bagged Ensemble:

----------

Smoother and more stable.

Out-of-Bag (OOB) Samples

An interesting property of bootstrap sampling:

Some observations are never selected.

Example:

Original Dataset:

A B C D E

Bootstrap Sample:

A C C D E

Sample:

B

was not selected.

B becomes an:

Out-of-Bag Sample

Why OOB Samples are Useful

OOB samples can be used as validation data.

Benefits:

  • No separate validation set required
  • Efficient performance estimation

Decision Trees and Bagging

Bagging works especially well with:

Decision Trees

because trees have:

High Variance

Combining many trees dramatically improves stability.

Example: Loan Approval

Single Tree:

Accuracy = 82%

Bagged Trees:

Accuracy = 89%

Improved performance.

Example: Medical Diagnosis

Single model:

May overreact to unusual patients.

Bagged models:

Average multiple opinions.

Predictions become more reliable.

Example: Stock Prediction

Individual models:

Highly unstable.

Bagging:

Reduces randomness and improves consistency.

Advantages of Bagging

Reduces Overfitting

Less sensitive to noise.

Improves Accuracy

Often outperforms individual models.

Reduces Variance

Creates more stable predictions.

Parallel Training

Models can be trained simultaneously.

Works with Many Algorithms

Not limited to Decision Trees.

Limitations of Bagging

Increased Computation

Multiple models must be trained.

Higher Memory Usage

Many models need storage.

Less Interpretable

Harder to explain than a single model.

Limited Bias Reduction

Primarily targets variance.

Bagging vs Single Decision Tree

Single TreeBagging
High VarianceLow Variance
Less StableMore Stable
Faster TrainingSlower Training
Easier to InterpretHarder to Interpret
More Overfitting RiskLess Overfitting Risk

Bagging vs Boosting

BaggingBoosting
Parallel TrainingSequential Training
Reduces VarianceReduces Bias
Independent ModelsDependent Models
Easier to ParallelizeHarder to Parallelize

Random Forest and Bagging

Random Forest is essentially:

Bagging
+
Decision Trees
+
Random Feature Selection

Bagging forms the foundation of Random Forests.

Python Example

Using Scikit-Learn:

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

Create Model:

model = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=100
)

Train:

model.fit(X_train, y_train)

Predict:

predictions = model.predict(X_test)

Using OOB Evaluation

model = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=100,
oob_score=True
)

Real-World Applications

Fraud Detection

Combining multiple fraud classifiers.

Healthcare

Disease prediction systems.

Finance

Credit risk analysis.

Recommendation Systems

Aggregating user preference models.

Marketing

Customer behavior prediction.

Common Mistakes

Using Too Few Models

Small ensembles may not provide sufficient variance reduction.

Ignoring OOB Evaluation

Useful validation information may be wasted.

Assuming Bagging Solves All Problems

Bagging reduces variance but may not fix high bias.

Best Practices

  • Use enough estimators
  • Monitor OOB score
  • Combine with Decision Trees
  • Compare with Random Forest
  • Validate on unseen data

Bagging Summary

ConceptPurpose
Bootstrap SamplingCreate Diverse Datasets
AggregationCombine Predictions
VotingClassification
AveragingRegression
OOB SamplesValidation

Bagging Workflow Summary

  1. Create bootstrap samples
  2. Train multiple models
  3. Generate predictions
  4. Aggregate outputs
  5. Produce final prediction
  6. Evaluate performance

Why Bagging is Important

Bagging was one of the first major breakthroughs in ensemble learning because it demonstrated that combining multiple unstable models can dramatically improve predictive performance. By training models on different bootstrap samples and aggregating their predictions, Bagging reduces variance, improves stability, and often achieves higher accuracy than a single model.

Understanding Bagging is essential because it forms the foundation of one of the most successful Machine Learning algorithms ever developed: Random Forest.

In the next article, we will study Random Forest, the ensemble algorithm that combines Bagging with random feature selection to create highly accurate and robust predictive models.