Flaky Test Discipline

Flaky tests erode trust. The discipline.

The cost of flaky tests

Flakes destroy CI trust faster than anything else. Above 1% flake rate, "is it broken or flaky?" becomes the default question, retry-until-green becomes the default behaviour, and real bugs slip through because nobody trusts a red CI.

Detection

Detection is mechanical. Track per-test pass rate over 100 runs, use tooling to flag flakes automatically, manually retest with same-SHA when in doubt.

Quarantine flakes fast

Quarantine within 24 hours of confirmation. Quarantined tests get a deadline to fix or delete, the quarantine list itself is bounded so it does not accumulate into a graveyard.

How to fix flakes

Most flakes are timing or shared-state. Fix with explicit synchronisation, isolated fixtures, deterministic IDs. When the test surfaces a real production race, fix the bug rather than the test.

How to install the discipline

Three pieces install the discipline: track flake rate as a CI health metric, block merges when the flake threshold is exceeded, allocate explicit engineering time to flake fixes.