AI & ML Beginner By Samson Tanimawo, PhD Published Feb 4, 2025 9 min read

Overfitting: The First ML Problem Every Beginner Meets

Your model scores 99% on training data and 62% on new data. You have met overfitting. Here is what it is, why it happens, and the five techniques that reliably fix it.

What overfitting actually is

Overfitting is when a model memorises its training data instead of learning general patterns. It scores beautifully on what it has seen and fails on what it hasn’t.

The classical image: fit a polynomial of degree 10 through 10 noisy points. The curve touches every point perfectly but wiggles insanely between them. Show it an 11th point, and the prediction is nonsense. The model has captured the noise in the training set, not the signal.

Neural networks overfit by the same mechanism. They have so many adjustable weights that they can memorise arbitrary training examples. If you let them, they will.

A concrete example with numbers

You’re building a spam classifier. You have 1,000 labelled emails. You train a big model on all of them. It reaches 99.5% accuracy. Impressive.

You deploy. A week later, real-world accuracy is 72%.

What happened: the model memorised quirks of those 1,000 specific emails. Your friend Ahmed who sent an email saying “Hi, still on for Friday?” got marked non-spam. The model learned: “emails mentioning Friday from Ahmed are not spam.” In production, that rule doesn’t generalise to emails mentioning Friday from strangers.

The 99.5% was real. It was also useless. Training accuracy on the data the model has seen is almost always misleading about generalisation.

Why overfitting happens

Three mechanisms, often combining:

The remedies below address each of these.

Four warning signs

  1. Training accuracy much higher than validation accuracy. This is the textbook symptom. If train = 98% and validation = 75%, you’re overfitting.
  2. Validation loss starts rising while training loss keeps falling. Classic late-stage overfit. The model is starting to memorise.
  3. Model makes wildly different predictions on two almost-identical inputs. A sign the decision boundary has absorbed noise.
  4. A 1% change in training data changes the model’s predictions by 20%+. High variance means the model is not finding a stable pattern.

Five proven fixes, in order of effectiveness

Try them in this order. In most cases, you don’t need more than three.

  1. More data. The single biggest lever. Every additional 10× in training data makes overfitting roughly 3× less severe. Before any clever regularisation, ask: can we get more data?
  2. Simpler model. If your model has 50 million parameters and your dataset has 5,000 examples, you don’t need a sophisticated regulariser, you need a 500,000-parameter model. Resist the reflex to reach for the biggest architecture.
  3. Early stopping. Track validation loss during training. Stop training the moment validation loss starts rising. This single technique handles maybe 50% of overfitting in practice.
  4. Dropout. During training, randomly zero out a fraction (10-50%) of neuron activations in each layer. This prevents any single neuron from becoming essential and forces the network to spread knowledge across weights. One of the most effective regularisers ever invented.
  5. Weight decay (L2 regularisation). Add a small penalty proportional to the squared magnitude of the weights to the loss function. This keeps weights small and, by extension, keeps the model from becoming overly flexible.

Modern deep-learning frameworks have all of these as one-line config options. Use them.

What not to do

How to detect overfitting in production

Overfitting on the training set is easy to catch with a validation set. Overfitting to recent production data is sneakier because your production “validation set” is the world and it moves under you.

Watch for:

Regular retraining on a mix of recent and historical data keeps the model grounded. Retraining only on recent data accelerates this form of overfitting.