The SLI Revision Cadence That Keeps Targets Honest
SLIs and SLOs drift. Revisit them quarterly. The format, the questions to ask, and what teams have changed in their second year.
Quarterly agenda
The quarterly review is what keeps SLIs from rotting into ceremony. Three questions per service, decided in writing, sets the agenda.
- SLI fitness. What is the current SLI/SLO? Did it match user-perceived reliability for the last quarter?
- Customer feedback. Are customers complaining about reliability dimensions the SLI does not measure?
- Engineering cost. Is the team paying to over-deliver on the SLO? Could a relaxed SLO free meaningful engineering capacity?
- Decision log. Outcome captured in writing; future reviews start from the prior decision, not from scratch.
Common changes in year two
Patterns repeat across teams that have run SLOs for a year. Three changes show up almost universally; expect them and plan around them.
- Add latency SLI. Most teams start with availability; year two adds latency for user-facing services.
- Tighten or loosen SLO. First SLO is usually wrong; the data tells you which way; do not be precious.
- Segment by user. 'p99 latency for premium customers' becomes a separate SLI when stakes differ.
- Drop dead SLIs. Some SLIs never fired and never will; remove them so the dashboard reflects what matters.
Avoid
Three failure modes corrode SLO discipline. Each one is tempting in the short term and destructive over time.
- Easing SLOs to look good. The point is honesty, not optics; tune SLO to reality, not to attainment dashboards.
- Too many SLIs. 3 to 5 per service is plenty; more becomes noise that nobody triages.
- Skipping the cadence. Without ritual, SLIs drift from reality; quarterly is the floor, not the ceiling.
- SLO as KPI. Tying SLO attainment to performance reviews creates pressure to game the metric; keep them as engineering tools.