Multi-Window Burn-Rate Alerts: A Deep Dive

Multi-window burn-rate is the modern SLO alert. Mastering the math takes one afternoon and pays back forever.

The single-window failure

Single-window burn-rate alerts fire on any short blip. The math conflates "1 minute of badness" with "real budget consumption"; the result is flapping pages and on-call attrition.

Multi-window confirmation

The multi-window pattern uses two confirmation windows of different lengths. Both must agree before the alert fires; this is the structural fix to the single-window problem.

Threshold + window-pair math

The burn-rate threshold and window pair determine how aggressive the alert is. The SRE workbook canonical pairs are 14.4/1h and 6/6h; the math behind them is worth understanding.

PromQL rule template

The Prometheus rule template ships in roughly 12 lines per SLO. Each SLO needs its own pair; do not generalise the rule globally because thresholds depend on the SLO target.

Antipatterns

What to do this week

Three moves. (1) Apply the pattern to your most-impactful service. (2) Measure adherence for 30 days. (3) Rewrite the policy or the SLO if the gap is durable.