Flaky Test Discipline
Flaky tests erode trust. The discipline.
The cost of flaky tests
Flakes destroy CI trust. Engineers retry until green; bugs slip through; broken builds get ignored.
Flake rate above 1% of runs makes "is it broken or flaky?" the default question.
Flakes are not a sign of bad luck. They're a sign of timing-dependent or environment-dependent code under test.
Detection
Track per-test pass rate over the last 100 runs. Anything below 99% is suspect.
Tools: BuildPulse, Trunk.io, RunForCover. Integrate with CI to flag flakes automatically.
Manual: run failing tests on the same SHA 10 times. If 2 fail, it's flaky.
Quarantine flakes fast
Within 24 hours of detection, quarantine the test. Move to a non-blocking test suite.
Quarantined tests get a deadline: fix or delete in 2 weeks.
Don't let quarantine become a graveyard. Track quarantine list size; above 5% is a sign of giving up.
How to fix flakes
Most flakes are timing: implicit waits, race conditions, shared state.
Fix patterns: explicit synchronization, isolated test fixtures, deterministic IDs.
If the flake represents a real production race, the test is correctly catching the bug. Fix the bug, not the test.
How to install the discipline
Track flake rate as a CI health metric. Publish weekly.
Block merges when flake rate exceeds threshold for the affected suite.
Allocate engineering time. Flake fixes don't ship features; they need explicit prioritization.