ExamAdda LogoExamAdda
PremiumDSA Animations
SYSTEM DESIGN (LLD)SYSTEM DESIGN (HLD)DSAMACHINE LEARNINGGENAIMYSQLOSARTIFICIAL INTELLIGENCEINTERVIEWDBMSDEEP LEARNINGROADMAPSJAVASCRIPTPYTHON

Basic ML

  • RoadMap
  • Foundations of ML
  • Python for ML
  • SQL for Data & ML
  • Mathematics for ML

Data Processing

  • Data Preprocessing
  • Exploratory Data Analysis (EDA)

Supervised Learning

  • Regression Algorithms
    • Linear Regression
    • Logistic Regression
      • Logistic Regression
      • Sigmoid function
      • Logistic regression intuition
      • Cross entropy loss
      • Confusion matrix
      • Precision, Recall, F1-score
      • ROC-AUC
    • K-Nearest Neighbors
    • Decision Trees
    • Ensemble Learning
    • Support Vector Machines

Unsupervised Learning

  • Clustering
  • Dimensionality Reduction
  • Association Rule Learning

Reinforcement Learning

  • What is RL?
  • Agent-environment interaction
  • Rewards & policies
  • Markov Decision Process
  • Q-Learning
  • Deep Q Networks
  • Policy Gradient Methods

Advanced ML

  • Time Series Forecasting
  • Anomaly Detection
  • Recommendation Systems
  • Federated Learning
  • AutoML

Model Evaluation

Precision, Recall, and F1 Score

Last updated: Jun 13, 2026
Author :Christy Harshitha DakarapuChristy Harshitha Dakarapu

In the previous article, we learned about the Confusion Matrix, which breaks classification predictions into:

  • True Positives (TP)
  • True Negatives (TN)
  • False Positives (FP)
  • False Negatives (FN)

We also discovered an important problem:

A model can achieve very high accuracy and still be practically useless.

Consider a fraud detection system:

Transaction TypeCount
Genuine990
Fraud10

Suppose a model predicts:

Everything is Genuine

Accuracy:

99%99\%99%

This looks impressive.

However:

The model detects:

000

frauds.

Clearly, accuracy alone cannot tell the complete story.

To solve this problem, Machine Learning uses three extremely important evaluation metrics:

  • Precision
  • Recall
  • F1 Score

These metrics help us understand different aspects of classification performance and are widely used in:

  • Healthcare
  • Fraud Detection
  • Cybersecurity
  • Search Engines
  • Recommendation Systems
  • Deep Learning

Why Accuracy is Not Enough

Consider the following confusion matrix.

Actual / PredictedPositiveNegative
Positive1090
Negative0900

Accuracy:

10+9001000=91%\frac{10+900}{1000} = 91\%100010+900​=91%

Looks good.

But:

The model misses:

909090

positive cases.

This can be disastrous in applications such as disease detection.

Recap: Confusion Matrix

Actual / PredictedPositiveNegative
PositiveTPFN
NegativeFPTN

Where:

  • TP = True Positive
  • TN = True Negative
  • FP = False Positive
  • FN = False Negative

Precision, Recall, and F1 Score are derived directly from these values.

What is Precision?

Precision answers the question:

Out of all positive predictions, how many were actually positive?

Formula:

Precision=TPTP+FPPrecision=\frac{TP}{TP+FP}Precision=TP+FPTP​

Understanding Precision Intuitively

Suppose a model predicts:

100 Emails are Spam

Reality:

80 Actually Spam
20 Not Spam

Precision:

80100=0.8\frac{80}{100} = 0.810080​=0.8 80%80\%80%

Precision Interpretation

High Precision means:

When the model predicts Positive,
it is usually correct.

Example

Fraud Detection:

Predicted Fraud:

100 transactions

Actually Fraud:

95 transactions

Precision:

95%95\%95%

Very high precision.

Why Precision Matters

Precision is important when:

False Positives are costly.

Examples:

  • Spam Detection
  • Loan Approval
  • Legal Systems
  • Content Moderation

Spam Detection Example

False Positive:

Important Email
↓
Marked as Spam

This is undesirable.

High Precision minimizes such mistakes.

What is Recall?

Recall answers the question:

Out of all actual positives, how many did the model correctly identify?

Formula:

Recall=TPTP+FNRecall=\frac{TP}{TP+FN}Recall=TP+FNTP​

Understanding Recall Intuitively

Suppose:

Actual Fraud Cases:

100

Detected Fraud Cases:

90

Recall:

90100=0.9\frac{90}{100} = 0.910090​=0.9 90%90\%90%

Recall Interpretation

High Recall means:

The model finds most positive cases.

Why Recall Matters

Recall becomes critical when:

False Negatives are dangerous.

Examples:

  • Cancer Detection
  • Fraud Detection
  • Security Systems
  • Disaster Prediction

Medical Diagnosis Example

False Negative:

Patient Has Cancer
↓
Model Predicts Healthy

This can be life-threatening.

High Recall minimizes missed cases.

Precision vs Recall

These metrics focus on different goals.

Precision Focus

When Positive is Predicted,
Be Correct

Recall Focus

Find As Many Positives
As Possible

Example

Suppose:

Actual Positive Cases:

100

Model A:

Detects:

50

Precision:

100%

Recall:

50%

Model B:

Detects:

95

Precision:

70%

Recall:

95%

Different models prioritize different objectives.

Visualizing Precision

Predicted Positive
↓

Correct Positive
Incorrect Positive

↑
Precision Measures This

Visualizing Recall

Actual Positive Cases
↓

Found Positives
Missed Positives

↑
Recall Measures This

Precision Example Calculation

Confusion Matrix:

Actual / PredictedPositiveNegative
Positive8020
Negative1090

Precision:

8080+10\frac{80}{80+10}80+1080​ 0.8890.8890.889 88.9%88.9\%88.9%

Recall Example Calculation

Same matrix:

Recall:

8080+20\frac{80}{80+20}80+2080​ 0.80.80.8 80%80\%80%

The Precision-Recall Tradeoff

Improving one often reduces the other.

Example:

Very Strict Model:

Predict Positive
Only When Extremely Certain

Results:

  • High Precision
  • Low Recall

Very Relaxed Model

Predict Positive Frequently

Results:

  • High Recall
  • Lower Precision

Example: Airport Security

Strict Screening:

More people flagged.

Results:

  • High Recall
  • Lower Precision

Relaxed Screening:

Fewer people flagged.

Results:

  • Higher Precision
  • Lower Recall

What is F1 Score?

Sometimes we need a balance between Precision and Recall.

F1 Score combines both into a single metric.

Formula:

F1=2×Precision×RecallPrecision+RecallF1=2\times\frac{Precision\times Recall}{Precision+Recall}F1=2×Precision+RecallPrecision×Recall​

Why Not Use Average?

Suppose:

Precision:

100%

Recall:

0%

Average:

50%

This looks acceptable.

However:

The model is actually useless.

F1 uses the harmonic mean, which penalizes extreme imbalance.

Example Calculation

Precision:

0.80.80.8

Recall:

0.60.60.6

F1:

2×0.8×0.60.8+0.62\times \frac{0.8\times0.6} {0.8+0.6}2×0.8+0.60.8×0.6​ 0.6860.6860.686

F1 Score Interpretation

High F1 means:

  • High Precision
  • High Recall

Balanced performance.

F1 Score Range

ValueInterpretation
1.0Perfect
0.8Very Good
0.5Moderate
0.0Poor

Comparing Metrics

Suppose:

MetricValue
Accuracy95%
Precision50%
Recall40%
F1 Score44%

Despite high accuracy,

the classifier performs poorly.

Example: Disease Detection

Confusion Matrix:

Actual / PredictedDiseaseHealthy
Disease9010
Healthy20180

Precision:

9090+20=81.8%\frac{90}{90+20} = 81.8\%90+2090​=81.8%

Recall:

9090+10=90%\frac{90}{90+10} = 90\%90+1090​=90%

F1 Score:

85.7%85.7\%85.7%

This model performs well.

Example: Spam Detection

Suppose:

Precision:

95%

Recall:

60%

Interpretation:

Most detected spam emails are truly spam.

However:

Many spam emails are still reaching the inbox.

Choosing the Right Metric

When Precision Matters Most

Examples:

  • Spam Detection
  • Loan Approval
  • Search Results

Goal:

Avoid false positives.

When Recall Matters Most

Examples:

  • Cancer Detection
  • Fraud Detection
  • Intrusion Detection

Goal:

Avoid false negatives.

When F1 Score Matters Most

Examples:

  • Imbalanced Datasets
  • General Classification Problems
  • Production ML Systems

Goal:

Balance precision and recall.

Python Implementation

Precision:

from sklearn.metrics import precision_score

precision_score(
y_true,
y_pred
)

Recall:

from sklearn.metrics import recall_score

recall_score(
y_true,
y_pred
)

F1 Score:

from sklearn.metrics import f1_score

f1_score(
y_true,
y_pred
)

Classification Report

Scikit-Learn provides all metrics together.

from sklearn.metrics import classification_report

print(
classification_report(
y_true,
y_pred
)
)

Example Output:

Precision: 0.88
Recall: 0.84
F1 Score: 0.86

Real-World Applications

Healthcare

High Recall preferred.

Missing a disease is costly.

Fraud Detection

High Recall preferred.

Missing fraud is expensive.

Search Engines

High Precision preferred.

Users want relevant results.

Recommendation Systems

Balanced Precision and Recall often desired.

Common Mistakes

Using Accuracy Alone

Accuracy can be misleading.

Ignoring Business Context

Different applications require different priorities.

Chasing Precision Only

High precision with low recall may miss important cases.

Chasing Recall Only

High recall with low precision may generate too many false alarms.

Best Practices

  • Always analyze the confusion matrix first
  • Calculate Precision and Recall together
  • Use F1 Score when classes are imbalanced
  • Select metrics based on business requirements
  • Evaluate models on unseen test data

Precision, Recall, and F1 Score Summary

MetricFormulaFocus
PrecisionTP / (TP + FP)Prediction Quality
RecallTP / (TP + FN)Positive Detection
F1 ScoreHarmonic MeanBalance

Evaluation Workflow

  1. Build confusion matrix
  2. Calculate Precision
  3. Calculate Recall
  4. Calculate F1 Score
  5. Compare models
  6. Select best model
  7. Optimize threshold if needed

Why Precision, Recall, and F1 Score are Important

Precision, Recall, and F1 Score provide a much deeper understanding of classification performance than accuracy alone. They help reveal whether a model is generating false alarms, missing important cases, or maintaining a healthy balance between both.

These metrics are essential because real-world Machine Learning systems often operate on imbalanced datasets where accuracy can be misleading. Understanding these measures enables practitioners to design models that align with business goals and make more reliable decisions.

In the next article, we will explore ROC Curves and AUC, powerful evaluation tools that analyze classifier performance across different probability thresholds rather than relying on a single threshold such as 0.5.


Previous Tutorial
Confusion matrix
Next Tutorial
ROC-AUC
ExamAdda LogoExamAdda Tech

Your comprehensive destination for learning programming, web development, data science, and modern technologies. Master coding with our in-depth tutorials and practical examples.

Support

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect With Us

Follow us on social media for the latest tutorials, tips, and programming updates.

© 2026 ExamAdda Tech. All rights reserved.

Privacy PolicyTerms of ServiceCookie Policy