SLO Policy Document: What to Write, What to Skip
An SLO policy that nobody reads is no policy. The minimum-viable shape is short and concrete.
Why a written policy
Without a written policy, "the SLO" is whatever the most-recent meeting said. Memory rots; numbers drift; the next personnel change reopens the debate. A short policy doc is the source of truth that survives manager turnover.
- Memory rots. Six months in, the SLO target is whatever the loudest engineer says it is; the doc anchors it.
- Numbers drift. The target started as 99.9; in conversation it became 99.5; the policy doc catches the drift.
- Manager turnover. The new manager inherits the policy, not a memory; the SLO survives the team change.
- The right shape. Short and concrete; long policies do not get read; the right shape fits on one page.
Four sections that matter
Four sections cover the policy contract. SLO target, error budget, action when missed, review cadence; each is a question the next reader will ask, and each deserves an explicit answer.
- The SLO target. The number with the definition; "99.9% of HTTP 200 responses over 30 rolling days."
- The error budget. The consumption rules; how the budget is computed; when burn-rate alerts fire.
- The action when missed. Stop feature work? Page exec? The doc is the contract; without action, the SLO is decoration.
- The review cadence. Quarterly check, annual revisit; supports the policy staying current with reality.
Three to skip
Long history of past SLOs (use git). Architecture diagrams (use the design doc). Lessons learned (use postmortems).
The policy is a contract; not a wiki article.
Review cadence
Quarterly review: did we hit it? If not, what changed? Adjust target or invest in reliability.
Annual review: is this SLO still the right one? Customer expectations move.
Antipatterns
- SLO policy in one engineer’s head. Bus factor 1.
- 20-page policy. Nobody reads.
- Policy that does not say what to do when missed. Decoration.
What to do this week
Three moves. (1) Apply the pattern to your most-impactful service. (2) Measure adherence for 30 days. (3) Rewrite the policy or the SLO if the gap is durable.