Overfitting and Underfitting

When we train a Machine Learning model, the real goal is generalisation — performing well on new, unseen data, not just the data it trained on. Two common problems get in the way of this goal: underfitting and overfitting. Understanding both is essential to building models that actually work in the real world.

In short: an underfit model is too simple to learn the pattern, and an overfit model is too complex and memorises the data instead of learning the pattern. The sweet spot in between is a good fit.

💡 In one line: Underfitting = the model learns too little; Overfitting = the model learns too much (including the noise). We want the balance in between.

What is Underfitting?

Underfitting happens when a model is too simple to capture the underlying pattern in the data. It performs poorly on both the training data and new data.

Signs of underfitting:

  • High error on the training set
  • High error on the test set
  • The model is too basic for the problem

Common causes:

  • The model is too simple (e.g. a straight line for curved data)
  • Too few features or not enough training
  • Too much regularisation (over-restricting the model)

What is Overfitting?

Overfitting happens when a model is too complex and learns the training data too well — including its random noise and quirks. It performs brilliantly on training data but poorly on new data, because it memorised instead of generalising.

Signs of overfitting:

  • Very low error on the training set
  • High error on the test set (a big gap between the two)
  • The model is overly complex

Common causes:

  • The model is too complex for the amount of data
  • Too little training data
  • Training for too long
  • Too many features

Underfitting vs. Overfitting

AspectUnderfittingOverfitting
Model complexityToo simpleToo complex
Training errorHighVery low
Test errorHighHigh
What went wrongFailed to learn the patternMemorised noise, not the pattern
GeneralisationPoorPoor
AnalogyDidn't study enoughMemorised answers without understanding

Notice that both lead to poor performance on new data — they just fail in opposite ways.

The Bias–Variance Trade-off

Underfitting and overfitting are two sides of a deeper idea called the bias–variance trade-off:

  • High Bias → Underfitting. The model makes strong, oversimplified assumptions and misses the real pattern.
  • High Variance → Overfitting. The model is overly sensitive to the training data and changes wildly with small data changes.

As model complexity increases, training error keeps falling — but test error first falls, then rises again once the model starts overfitting. The lowest point of the test-error curve is the best fit.

.

How to Fix Underfitting

  • Use a more complex model (e.g. add layers or use a more powerful algorithm)
  • Add more relevant features
  • Train longer so the model can learn more
  • Reduce regularisation that may be over-restricting it

How to Fix Overfitting

  • Get more training data (the single most effective fix)
  • Use a simpler model
  • Apply regularisation (L1/L2) to penalise complexity
  • Use dropout in neural networks
  • Use cross-validation to check generalisation
  • Apply early stopping — stop training before it starts memorising
  • Remove irrelevant features

A Simple Analogy

Think of a student preparing for an exam using practice papers:

  • Underfitting — they barely studied, so they fail even the practice questions.
  • Overfitting — they memorised the exact practice answers, so they ace the practice papers but fail the real exam with new questions.
  • Good fit — they understood the underlying concepts, so they handle both the practice papers and new exam questions well.

The good-fit student has done what we want every model to do: learn the concept, not memorise the examples.

Summary

  • The goal of an ML model is generalisation — performing well on unseen data.
  • Underfitting (high bias) means the model is too simple and performs poorly everywhere.
  • Overfitting (high variance) means the model is too complex, acing training data but failing on new data.
  • The two are connected by the bias–variance trade-off, with the best fit sitting between them.
  • Fix underfitting with more complexity/features; fix overfitting with more data, simpler models, regularisation, and early stopping.