Error Rate Burn vs Error Budget Burn
Two related but distinct concepts. Error rate is the per-time-unit error count; budget burn is the cumulative against SLO.
Error rate
Error rate and error budget burn are two different ways of measuring service reliability. Both are useful; both produce different signals; both deserve their own alerts. Conflating the two produces blind spots; running both produces complete coverage.
What error rate measures:
- Errors per minute, errors per request.: The rate is the number of errors over a window. The window is typically minutes (1, 5, 15). The rate produces a number that the team can alert on directly.
- Useful for short-term alerting.: Error rate is the right signal for "something is wrong right now". The metric responds quickly to spikes; the alert fires fast; the team responds.
- Spike-driven.: A burst of errors produces a high rate; the rate triggers the alert. The pattern catches incidents that produce sudden failure.
- Per-service granularity.: Each service has its own error rate. The alert routes to the service's owners; the response is targeted.
- Threshold tuning.: The threshold for what counts as a high rate is calibrated. Some services normally have 0% error rate; any error is a signal. Other services have baseline error rates; the threshold is tuned above the baseline.
Error rate is the operational metric. It catches incidents in real time.
Budget burn
Error budget burn is a different metric. It measures progress toward exhausting the SLO's error budget. Burn rate captures the cumulative risk; rate alerts capture the spike.
- Errors as a fraction of allowed errors over the SLO window.: The SLO's error budget is the total errors allowed within the window. The burn rate is the fraction of that budget consumed in the recent period. Burn rates above 1.0 mean the budget is being consumed faster than sustainable.
- Cumulative.: The burn metric is cumulative across the SLO window. A constant low error rate produces a steady burn; a brief spike produces a fast burn followed by recovery.
- Small errors accumulating over hours.: A low error rate that does not trigger rate alerts can still consume the error budget over time. The cumulative effect is what burn measures.
- Without firing rate alerts.: The rate alerts have thresholds; rates below the thresholds do not fire. Burn alerts catch the cumulative impact that rate alerts miss.
- Per-SLO measurement.: Each SLO has its own burn measurement. The burn is bounded by the SLO; different SLOs on the same service might have different burn states.
Budget burn is the strategic metric. It captures the cumulative risk that rate alerts cannot see.
Use both
Rate alerts and burn alerts are complementary. Each catches situations the other misses. Mature alerting strategies use both; the combination produces complete coverage.
- Rate alerts: something is happening right now.: The rate alert fires when current error volume is unusual. The team responds to the immediate situation; the SLO is a separate concern from the moment-to-moment alert.
- Burn alerts: we are heading toward SLO violation by end of period.: The burn alert fires when the cumulative pace is too fast. The team's reliability commitment is at risk; the response is to course-correct before the SLO breaks.
- Both are essential.: Running only rate alerts misses cumulative drift. Running only burn alerts misses real-time spikes. Running both produces complete coverage.
- Different escalation paths.: Rate alerts often page; burn alerts often ticket. The urgency reflects the underlying signal: rate is "act now"; burn is "course-correct soon".
- Both feed the SLO conversation.: The team's quarterly SLO review references both. Did rate alerts catch the right things? Did burn alerts produce the right course-corrections? The data informs future tuning.
Error rate burn vs error budget burn is a complementary pair. Nova AI Ops integrates with SLO platforms, runs both classes of alerts, and produces the unified view that the engineering team uses to manage both immediate and cumulative reliability.