In the previous article, we learned about Cross Entropy Loss, which measures how well a classification model predicts probabilities.

However, once a classification model starts making predictions, we often want answers to questions such as:

  • How many predictions were correct?
  • How many predictions were incorrect?
  • What kinds of mistakes is the model making?
  • Is the model missing important cases?
  • Is it generating too many false alarms?

Accuracy alone cannot answer these questions.

Consider a fraud detection system.

Suppose:

99% Genuine Transactions
1% Fraudulent Transactions

A model that predicts:

Always Genuine

would achieve:

99% Accuracy

Yet it would detect zero frauds.

Clearly, accuracy alone is not enough.

To understand model performance in detail, we use the Confusion Matrix.

The Confusion Matrix is the foundation of classification evaluation and forms the basis of metrics such as:

  • Precision
  • Recall
  • F1 Score
  • Specificity
  • ROC-AUC

What is a Confusion Matrix?

A Confusion Matrix is a table that summarizes the performance of a classification model by comparing:

Actual Classes
vs
Predicted Classes

It shows:

  • Correct predictions
  • Incorrect predictions
  • Types of mistakes made by the model

Binary Classification Example

Suppose we are building a disease prediction system.

Possible outcomes:

Positive
Negative

Or:

1
0

The Confusion Matrix contains four outcomes.

Structure of a Confusion Matrix

Actual / PredictedPositiveNegative
PositiveTPFN
NegativeFPTN

Where:

  • TP = True Positive
  • TN = True Negative
  • FP = False Positive
  • FN = False Negative

These four values form the foundation of classification evaluation.

Understanding True Positive (TP)

True Positive means:

Actually Positive
Predicted Positive

Example:

Disease Present:

Yes

Prediction:

Disease Present

The model is correct.

Example

Patient actually has disease.

Model predicts:

Positive

Result:

True Positive.

Understanding True Negative (TN)

True Negative means:

Actually Negative
Predicted Negative

Example:

Patient does not have disease.

Model predicts:

Negative

The model is correct.

Example

Disease:

Absent

Prediction:

Absent

Result:

True Negative.

Understanding False Positive (FP)

False Positive means:

Actually Negative
Predicted Positive

The model incorrectly predicts the positive class.

Example

Patient:

Healthy

Prediction:

Disease Present

Result:

False Positive.

This is often called:

False Alarm

Understanding False Negative (FN)

False Negative means:

Actually Positive
Predicted Negative

The model misses a positive case.

Example

Patient:

Has Disease

Prediction:

Healthy

Result:

False Negative.

This is often the most dangerous error.

Summary of the Four Outcomes

ActualPredictedOutcome
PositivePositiveTrue Positive
PositiveNegativeFalse Negative
NegativePositiveFalse Positive
NegativeNegativeTrue Negative

Example Confusion Matrix

Suppose:

100 patients.

Model predictions:

Actual / PredictedPositiveNegative
Positive4010
Negative545

Therefore:

TP = 40
FN = 10
FP = 5
TN = 45

Visualizing the Matrix

                Predicted
Positive Negative

Actual Positive 40 10
Actual Negative 5 45

Total Predictions

Total samples:

40+10+5+45=10040+10+5+45 = 100

Understanding Correct Predictions

Correct predictions:

TP+TNTP+TN 40+45=8540+45 = 85

Correctly classified samples:

85

Understanding Incorrect Predictions

Incorrect predictions:

FP+FNFP+FN 5+10=155+10 = 15

Wrong predictions:

15

Why is it Called a "Confusion" Matrix?

Because it shows where the model becomes confused.

Example:

The model may confuse:

Disease
with
No Disease

or

Spam
with
Not Spam

The matrix reveals these mistakes.

Real-World Example: Spam Detection

Positive Class:

Spam

Negative Class:

Not Spam

True Positive

Spam email correctly identified.

True Negative

Legitimate email correctly identified.

False Positive

Legitimate email marked as spam.

False Negative

Spam email reaches inbox.

Real-World Example: Fraud Detection

Positive Class:

Fraud

Negative Class:

Genuine

True Positive

Fraud detected correctly.

True Negative

Genuine transaction accepted.

False Positive

Valid transaction blocked.

False Negative

Fraud transaction missed.

Which Error is Worse?

The answer depends on the application.

Medical Diagnosis

False Negative is often worse.

Example:

Patient has cancer.

Model predicts:

Healthy

Dangerous mistake.

Spam Detection

False Positive may be worse.

Example:

Important email moved to spam folder.

Fraud Detection

Both errors can be costly.

Business requirements determine priorities.

Accuracy from the Confusion Matrix

Accuracy measures overall correctness.

Formula:

Accuracy=TP+TNTP+TN+FP+FNAccuracy=\frac{TP+TN}{TP+TN+FP+FN}

Using our example:

Accuracy=40+45100Accuracy= \frac{40+45}{100} 85%85\%

Why Accuracy Can Be Misleading

Suppose:

1000 transactions.

TypeCount
Genuine990
Fraud10

Model predicts:

Everything Genuine

Confusion Matrix:

Actual / PredictedFraudGenuine
Fraud010
Genuine0990

Accuracy:

99%99\%

Yet:

Fraud detection is useless.

This is why we need additional metrics.

Confusion Matrix as a Foundation

Many classification metrics come directly from:

  • TP
  • TN
  • FP
  • FN

Examples:

MetricUses
PrecisionTP, FP
RecallTP, FN
F1 ScorePrecision, Recall
SpecificityTN, FP

Multi-Class Confusion Matrix

The concept extends beyond binary classification.

Example:

Animal Classification

Classes:

Cat
Dog
Horse

The matrix becomes larger.

Example:

Actual / PredictedCatDogHorse
Cat4032
Dog4351
Horse2330

Each row shows actual labels.

Each column shows predicted labels.

Python Implementation

Using Scikit-Learn:

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(
y_true,
y_pred
)

print(cm)

Example Output:

[[45  5]
[10 40]]

Interpretation:

TN = 45
FP = 5
FN = 10
TP = 40

Visualizing the Confusion Matrix

import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(
cm,
annot=True,
fmt="d"
)

plt.xlabel("Predicted")
plt.ylabel("Actual")

plt.show()

Interpreting a Confusion Matrix

When examining a confusion matrix:

Ask:

  1. How many correct predictions?
  2. How many false positives?
  3. How many false negatives?
  4. Which error is most costly?
  5. Is class imbalance present?

Common Applications

Medical Diagnosis

Detecting diseases.

Fraud Detection

Identifying fraudulent transactions.

Email Filtering

Classifying spam emails.

Customer Churn Prediction

Predicting customer departures.

Credit Risk Assessment

Approving or rejecting loans.

Common Mistakes

Looking Only at Accuracy

Accuracy may hide important errors.

Ignoring False Negatives

Some applications require minimizing missed detections.

Ignoring False Positives

False alarms can be expensive.

Misinterpreting Matrix Orientation

Always verify:

  • Rows = Actual
  • Columns = Predicted

(or vice versa depending on implementation).

Best Practices

  • Always inspect the confusion matrix
  • Analyze both FP and FN
  • Consider business impact
  • Use additional metrics
  • Evaluate on unseen data

Confusion Matrix Workflow

A typical workflow is:

  1. Train classifier
  2. Generate predictions
  3. Build confusion matrix
  4. Identify TP, TN, FP, FN
  5. Analyze errors
  6. Calculate evaluation metrics
  7. Improve model

Why the Confusion Matrix is Important

The Confusion Matrix is one of the most important tools in classification because it reveals exactly how a model is making decisions and where it is making mistakes.

Unlike accuracy, which provides only a single number, the Confusion Matrix offers a complete breakdown of correct and incorrect predictions. This makes it the foundation for nearly every advanced classification metric used in Machine Learning.

Understanding the Confusion Matrix is essential because Precision, Recall, F1 Score, Specificity, ROC Curves, and many other evaluation techniques are all built directly from its four core components: True Positives, True Negatives, False Positives, and False Negatives.

In the next article, we will study Precision, Recall, and F1 Score, the most important classification metrics derived from the Confusion Matrix.