Test Flakiness Budget
Cap on flaky tests. Forcing fixing.
What a flake budget is
Maximum acceptable percent of test runs that flake. Above the budget, no new tests merge until cleanup.
Typical budget: 1% of CI runs experience a flake.
Forces ownership. The team that adds the flake also fixes it.
Measuring flakes
Re-run failed tests on the same SHA. If they pass on retry, mark as flake.
Track per-suite flake rate. Some suites (browser tests, integration) are inherently flakier.
Tools: BuildPulse, Trunk.io, GitHub's flaky test detection.
When you blow the budget
Halt new test additions. Existing tests may continue, but no new tests until flake count drops.
Quarantine the worst offenders. Move to a non-blocking suite.
Allocate engineering time. Flake fixes don't ship features; they need explicit prioritization.
Preventing new flakes
Code review checks: explicit synchronization, no `sleep()` in tests, deterministic test data.
Pre-merge: run new tests 10 times in CI before allowing the merge.
Dedicated reviewer for test quality on tier-1 services.
How to set the budget
Start with current flake rate as baseline. Budget = current rate * 0.5 over 6 months.
Tighter for unit tests (under 0.1% flake rate is achievable). Looser for end-to-end tests (1-5% may be necessary).
Publish the budget and current rate weekly. Visibility drives the cleanup.