AI & ML Advanced By Samson Tanimawo, PhD Published Mar 10, 2026 6 min read

Data Contamination in ML Benchmarks

If your benchmark questions appeared in the training data, the model isn’t reasoning, it’s remembering. Contamination has quietly broken many published comparisons.

What contamination is

Benchmark contamination occurs when test data ends up in training data, often by accident. The model has effectively memorised the test, its score reflects recall, not generalisation. Reported accuracy is inflated, sometimes dramatically.

How it happens

Detection

Two practical tests:

Neither is perfect; both catch most cases. Modern benchmark releases include contamination audits.

Avoiding it in your own evals

For any internal eval that drives decisions:

The mature stance: every published benchmark number is an upper bound. The unpublished, internal, post-2025 data is the only reliable signal.