Test Flakiness Budget

Cap on flaky tests. Forcing fixing.

What a flake budget is

The flake budget is the discipline of capping how flaky CI is allowed to get. Above the cap, new test additions stop until cleanup; the discipline is what prevents flakes from compounding into a silent test culture where everyone re-runs by default.

Measuring flakes

Measurement starts with re-runs on the same SHA. Pass-on-retry is the flake signal; per-suite tracking surfaces the worst offenders so cleanup time targets the highest-leverage fixes first.

When you blow the budget

The over-budget response is automatic and pre-agreed. Stop new tests, quarantine the worst offenders, and allocate explicit cleanup time so the recovery does not depend on a project manager remembering to schedule it.

Preventing new flakes

Prevention beats cleanup. Code-review checks, pre-merge multi-run, and a dedicated test-quality reviewer for tier-one services keep the flake-add rate below the cleanup rate.

How to set the budget

Set the budget from the current baseline. Tighter for unit tests, looser for end-to-end, published weekly so the bar drifts down rather than the metric drifting up.