SLO Incident Correlation

Incidents and SLO breaches.

Tracking SLO impact per incident

Tying every incident to SLO budget consumption turns abstract reliability work into a concrete number. Budget consumed, services affected, and duration recorded per incident produce a comparable artefact across the year.

SLO impact recorded per incident. Budget-consumed, services-affected, duration triplet captured during postmortem; the math becomes concrete.
Postmortem budget table. "Consumed X percent of monthly budget" line in every postmortem; comparable across incidents and quarters.
Quarterly aggregate by cause class. Cause-class consumption view per quarter surfaces patterns individual incidents hide.
Documented impact per incident. Named budget-consumption number per incident; supports honest reporting rather than vibes.

Aggregating to find patterns

Aggregation surfaces the patterns individual incidents hide. Top causes, top contributing services, and time-of-day concentration each tell a different story about where investment should land.

Top causes per quarter. Deploy-related, dependency-failure, configuration, capacity breakdown; the dominant cause class drives priority.
Top contributing services. Per-service contribution; the same service repeatedly burning budget points to architectural fragility, not bad luck.
Time-of-day patterns. Deploy-window, peak-traffic, off-hours concentration; the timing tells you where staffing or process changes the curve.
Named pattern owner per quarter. Responsible analyst per cycle; "we never actually looked across incidents" is the failure mode without an owner.

Investment decisions from patterns

Patterns drive investment decisions. Cause class drives engineering work; service drives architectural review; staffing decisions follow time-of-day concentration.

Top cause class drives engineering. 60 percent deploy-related means deploy reliability is the quarter's priority; the data picks the work.
Top service drives architectural review. Repeated incidents in the same service warrant structural change, not yet another tactical fix.
Staffing follows time-of-day concentration. Off-hours burn pattern may indicate the need for follow-the-sun coverage or schedule changes.
Documented driver per decision. Named pattern-to-investment linkage in writing; supports honest prioritisation when the next quarter argues differently.

The correlation dashboard

The dashboard makes the patterns visible. Burn-down by cause, per-service contribution, and recent budget-impacting incidents render the analysis as a single view.

Quarterly SLO budget burn-down. Stacked-by-cause-class view; trend visibility per quarter without manual aggregation.
Per-service contribution table. Sortable per-service view; drill-down to specific incidents supports investigation.
Recent budget-impacting incidents. Live recent-incident list; quick reference for ongoing context during reviews.
Named owner per dashboard. Responsible reliability lead per org; stale or wrong dashboards become misleading rather than informative.

Review cadence

Reviews run at three cadences. Monthly for trend-spotting, quarterly for investment decisions, annual for reliability strategy. Each cadence answers a different question.

Monthly SLO review. Incident correlation included in the monthly cycle; trends surface while there is still time to act on them.
Quarterly engineering review. Investment-decision review per quarter; engineering hours follow the data rather than the loudest voice.
Annual reliability strategy. Multi-quarter pattern review feeds multi-year roadmap; one quarter is noise, four quarters is signal.
Documented output per cadence. Decisions or actions named at every review; "we reviewed but didn't decide" is the failure mode.