Incident Pattern Library
Past incidents teach.
Overview
An incident pattern library indexes past incidents by recurring pattern (cache stampede, deployment regression, certificate expiry, dependency saturation) so the on-call recognises the incident class within minutes rather than rebuilding the diagnosis from scratch. Postmortems write down what happened; the pattern library makes those writings searchable and actionable in the next 3am bridge.
- Past incidents teach. Each incident has a recurring class; the library captures the class, not just the instance.
- Pattern indexing. Per-incident searchable tag (cache_stampede, deploy_regression, cert_expiry); the on-call greps by symptom and lands on the matching playbook.
- Per-pattern runbook. Each pattern has an explicit recovery sequence; the on-call follows steps, not vibes.
- Per-pattern detection plus committed library. Each pattern has an early-warning signal that moved into monitoring; the searchable archive is part of the team handbook.
The approach
The practical approach is pattern tagging at postmortem time (not retrospectively), runbook linking from pattern to recovery sequence, detection signals fed back into monitoring per pattern, quarterly review to fold new incidents into existing patterns or open new ones, and documented library structure so the next operator can navigate it without coaching.
- Pattern tagging. Per-incident searchable tag assigned during the postmortem; the tag is what makes the library navigable.
- Runbook linking. Per-pattern explicit runbook; the on-call clicks one link and gets the recovery sequence.
- Detection signal. Per-pattern early-warning indicator added to monitoring; recurrence catches itself before it becomes a page.
- Per-quarter review plus documented library. Quarterly review folds new incidents into patterns or creates new ones; the library structure committed to the team handbook for onboarding.
Why this compounds
Pattern library discipline compounds across incidents. Each tagged incident grows the searchable archive; each runbook link converts diagnosis time into recovery time; each detection signal converts incidents into near-misses. After a year, the on-call recognises 60 percent of incidents within minutes; after two, the rotation can onboard new engineers in weeks rather than months.
- Faster recognition. Right library matches the incident class fast; MTTR drops because diagnosis is recognition, not investigation.
- Better prevention. Per-pattern detection signals catch recurrence early; many incidents become near-misses caught by monitoring.
- Operator experience. Pattern recognition reduces cognitive load on the bridge; the on-call follows known steps under stress.
- Institutional knowledge. Each pattern teaches incident structure; the team builds a vocabulary that survives team turnover.
Pattern library discipline is an operational discipline that pays off across years. Nova AI Ops integrates with incident telemetry, surfaces recurring patterns, and supports the team’s incident-learning discipline.