In the previous article, we learned about Ensemble Learning, where multiple models work together to produce better predictions than a single model.
One of the earliest and most successful ensemble techniques is:
Bagging
Bagging is the technique behind one of the most popular Machine Learning algorithms:
Random Forest
The core idea of Bagging is surprisingly simple:
Instead of training one model,
train many models on slightly different versions of the dataset and combine their predictions.
This approach often produces more stable, accurate, and robust models.
What is Bagging?
Bagging stands for:
Bootstrap Aggregating
The name comes from two concepts:
Bootstrap
+
Aggregating
Bootstrap
Create multiple datasets by sampling from the original dataset.
Aggregating
Combine predictions from multiple models.
Together:
Multiple Datasets
↓
Multiple Models
↓
Combined Prediction
Why Do We Need Bagging?
Suppose we train a single Decision Tree.
Decision Trees are powerful but unstable.
A small change in training data may produce a very different tree.
Example:
Dataset Version 1:
Tree A
Dataset Version 2:
Tree B
Predictions may vary significantly.
This variability is called:
High Variance
Bagging helps reduce variance.
Intuition Behind Bagging
Imagine asking one student to estimate:
Number of candies in a jar
The estimate may be inaccurate.
Now ask:
100 Students
Average their answers.
The combined estimate is often much closer to reality.
Bagging uses the same idea.
Understanding Bootstrap Sampling
Bootstrap sampling is the foundation of Bagging.
Suppose we have a dataset:
| Sample |
|---|
| A |
| B |
| C |
| D |
| E |
Bootstrap sampling creates new datasets by randomly selecting samples:
A
C
C
D
E
Notice:
C appears twice
B is missing
This is allowed.
Sampling occurs:
With Replacement
What Does "With Replacement" Mean?
After selecting a sample,
it is returned to the dataset before the next selection.
Example:
Original Dataset:
A B C D E
Pick:
C
Return C.
Dataset remains:
A B C D E
C can be selected again.
Multiple Bootstrap Datasets
Original Dataset:
A B C D E
Bootstrap Dataset 1:
A C C D E
Bootstrap Dataset 2:
B B C E D
Bootstrap Dataset 3:
A A D E C
Each dataset is slightly different.
Training Multiple Models
Each bootstrap dataset trains a separate model.
Workflow:
Dataset 1 → Model 1
Dataset 2 → Model 2
Dataset 3 → Model 3
Dataset 4 → Model 4
Each model learns different patterns.
Prediction Phase
After training:
Each model makes a prediction.
Example:
Model 1 → Spam
Model 2 → Spam
Model 3 → Not Spam
Model 4 → Spam
Model 5 → Spam
Majority Voting
Classification problems use voting.
Votes:
Spam = 4
Not Spam = 1
Final Prediction:
Spam
Averaging for Regression
For regression tasks:
Predictions are averaged.
Example:
Model 1 → 45
Model 2 → 50
Model 3 → 55
Prediction:
Final Output:
50
Bagging Workflow
Original Dataset
↓
Bootstrap Samples
↓
Train Multiple Models
↓
Generate Predictions
↓
Vote / Average
↓
Final Prediction
Why Bagging Works
Different models make different mistakes.
Example:
Model 1 Wrong
Model 2 Correct
Model 3 Correct
Majority voting still produces the correct answer.
Errors tend to cancel out.
Understanding Variance Reduction
Single Decision Tree:
Prediction Changes Easily
High variance.
Bagged Trees:
Average Many Trees
More stable predictions.
Bias vs Variance
Bagging primarily reduces:
Variance
It does not significantly reduce bias.
Visualizing Variance Reduction
Single Tree:
/\/\/\/\/\/\
Very sensitive.
Bagged Ensemble:
----------
Smoother and more stable.
Out-of-Bag (OOB) Samples
An interesting property of bootstrap sampling:
Some observations are never selected.
Example:
Original Dataset:
A B C D E
Bootstrap Sample:
A C C D E
Sample:
B
was not selected.
B becomes an:
Out-of-Bag Sample
Why OOB Samples are Useful
OOB samples can be used as validation data.
Benefits:
- No separate validation set required
- Efficient performance estimation
Decision Trees and Bagging
Bagging works especially well with:
Decision Trees
because trees have:
High Variance
Combining many trees dramatically improves stability.
Example: Loan Approval
Single Tree:
Accuracy = 82%
Bagged Trees:
Accuracy = 89%
Improved performance.
Example: Medical Diagnosis
Single model:
May overreact to unusual patients.
Bagged models:
Average multiple opinions.
Predictions become more reliable.
Example: Stock Prediction
Individual models:
Highly unstable.
Bagging:
Reduces randomness and improves consistency.
Advantages of Bagging
Reduces Overfitting
Less sensitive to noise.
Improves Accuracy
Often outperforms individual models.
Reduces Variance
Creates more stable predictions.
Parallel Training
Models can be trained simultaneously.
Works with Many Algorithms
Not limited to Decision Trees.
Limitations of Bagging
Increased Computation
Multiple models must be trained.
Higher Memory Usage
Many models need storage.
Less Interpretable
Harder to explain than a single model.
Limited Bias Reduction
Primarily targets variance.
Bagging vs Single Decision Tree
| Single Tree | Bagging |
|---|---|
| High Variance | Low Variance |
| Less Stable | More Stable |
| Faster Training | Slower Training |
| Easier to Interpret | Harder to Interpret |
| More Overfitting Risk | Less Overfitting Risk |
Bagging vs Boosting
| Bagging | Boosting |
|---|---|
| Parallel Training | Sequential Training |
| Reduces Variance | Reduces Bias |
| Independent Models | Dependent Models |
| Easier to Parallelize | Harder to Parallelize |
Random Forest and Bagging
Random Forest is essentially:
Bagging
+
Decision Trees
+
Random Feature Selection
Bagging forms the foundation of Random Forests.
Python Example
Using Scikit-Learn:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
Create Model:
model = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=100
)
Train:
model.fit(X_train, y_train)
Predict:
predictions = model.predict(X_test)
Using OOB Evaluation
model = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=100,
oob_score=True
)
Real-World Applications
Fraud Detection
Combining multiple fraud classifiers.
Healthcare
Disease prediction systems.
Finance
Credit risk analysis.
Recommendation Systems
Aggregating user preference models.
Marketing
Customer behavior prediction.
Common Mistakes
Using Too Few Models
Small ensembles may not provide sufficient variance reduction.
Ignoring OOB Evaluation
Useful validation information may be wasted.
Assuming Bagging Solves All Problems
Bagging reduces variance but may not fix high bias.
Best Practices
- Use enough estimators
- Monitor OOB score
- Combine with Decision Trees
- Compare with Random Forest
- Validate on unseen data
Bagging Summary
| Concept | Purpose |
|---|---|
| Bootstrap Sampling | Create Diverse Datasets |
| Aggregation | Combine Predictions |
| Voting | Classification |
| Averaging | Regression |
| OOB Samples | Validation |
Bagging Workflow Summary
- Create bootstrap samples
- Train multiple models
- Generate predictions
- Aggregate outputs
- Produce final prediction
- Evaluate performance
Why Bagging is Important
Bagging was one of the first major breakthroughs in ensemble learning because it demonstrated that combining multiple unstable models can dramatically improve predictive performance. By training models on different bootstrap samples and aggregating their predictions, Bagging reduces variance, improves stability, and often achieves higher accuracy than a single model.
Understanding Bagging is essential because it forms the foundation of one of the most successful Machine Learning algorithms ever developed: Random Forest.
In the next article, we will study Random Forest, the ensemble algorithm that combines Bagging with random feature selection to create highly accurate and robust predictive models.