SLO Incident Correlation
Incidents and SLO breaches.
Tracking SLO impact per incident
Each incident records: SLO budget consumed, services affected, duration. Quantifies impact in concrete budget terms.
Per-postmortem: budget consumption table. 'This incident consumed 23% of monthly budget.' Concrete, comparable.
Aggregate quarterly: total budget consumption by cause class. Pattern surfaces.
Aggregating to find patterns
Per quarter: top causes of budget consumption. Deploy-related, dependency failures, configuration issues, capacity issues.
Top contributing services. Same service repeatedly burning budget points to architectural fragility.
Time-of-day patterns. Are budget burns concentrated during deploys, peak traffic, off-hours? Pattern informs investment.
Investment decisions from patterns
Top cause class drives engineering investment. If 60% of budget consumption comes from deploy-related issues, deploy reliability is the priority.
Top contributing service drives architectural review. Repeated incidents in service X mean service X needs structural changes.
Patterns inform staffing. Off-hours budget burn concentration may indicate need for follow-the-sun coverage.
The correlation dashboard
Per-quarter SLO budget burn-down. Stacked by cause class. Trend visible.
Per-service contribution table. Sortable; drill-down to specific incidents.
Recent budget-impacting incidents. Quick reference for ongoing context.
Review cadence
Monthly SLO review includes incident correlation. Spot trends early.
Quarterly engineering review. Investment decisions follow the data.
Annual reliability strategy update. Multi-quarter patterns inform multi-year roadmap.