Alerts Practical By Samson Tanimawo, PhD Published May 8, 2026 4 min read

Burn-Rate Alert Discipline

Burn-rate alerts catch sustained issues. The discipline that keeps them tuned.

Burn-rate alerts in one paragraph

Page when the error rate over a short window plus a long window jointly indicate budget exhaustion ahead of schedule.

Pair fast (5m, 1h) and slow (6h, 1d) windows. Fast catches sharp regressions; slow catches gradual erosion.

Standard pairs from the Google SRE workbook: 14.4x burn for 1h triggers paging; 1x burn for 1d triggers a ticket.

Why burn rate beats raw thresholds

Raw thresholds (error rate above 1%) ignore the SLO. A 1% rate is fine for a 99% SLO and disastrous for a 99.99% SLO.

Burn rate is normalized: how fast are you burning the monthly budget. Comparable across services.

Reduces noise by 30-50% in most catalogs because it ignores small spikes that don't threaten the budget.

Config patterns that work

Define SLO once, derive burn-rate rules. Sloth and Pyrra both generate Prometheus rules from a single SLO definition.

Use multi-window multi-burn-rate rules. Single-window rules either over-alert or miss slow drift.

Document the SLO target inline. SLO=99.9 monthly, fast burn=14.4x for 5m+1h, slow burn=6x for 30m+6h.

How to roll out

Pick 3 to 5 user-facing SLOs first. Don't try to migrate every metric at once.

Run burn-rate alerts in parallel with old threshold alerts for 30 days. Compare fire counts and resolved-ticket counts.

Cut over once the burn-rate version is producing actionable pages and the old rules are demonstrably noisier.

Adopt for paging tier

Skip for non-SLO signals. Burn rate makes sense only for ratio-style metrics with a target.

Don't apply to capacity or saturation alerts. Those have different shapes.

Avoid burn-rate alerts during the first month of a new service. SLOs aren't stable yet; thresholds drift.