The Incident Archive: Why It Matters
Past incidents are training data. The archive that makes the data accessible.
Structure
An incident archive is most useful when each incident is bundled, tagged consistently, and cross-linked to its remediation tickets. The structure is what turns “have we seen this before?” into a five-second answer instead of a Slack-search expedition.
- Per-incident artefact bundle. Timeline, postmortem, action items together. Single source of truth per incident.
- Consistent tagging. Service, severity, cause-class tags matching the live-incident schema. Cross-incident analysis becomes possible.
- Linked action-item tickets. Per-incident the remediation ticket links. Follow-up tracking flows from the archive.
- Customer-impact section. Affected count and duration documented per incident. Business reviews work from real numbers.
Search
Searchable archives turn history into a live tool. Full-text combined with tag filters finds prior cases in seconds; pattern recognition across incidents drives faster diagnosis.
- Full-text plus tag filter. Combined filter per search. Engineers find similar past incidents in seconds.
- Pattern recognition. “Have we seen this before” check before deep investigation. Saves the team from rediscovering known causes.
- Saved-query set per team. Bookmarked queries support recurring reviews and onboarding.
- Indexed-search backend. Elasticsearch, OpenSearch, or equivalent. Search performance holds as the archive grows.
Teach
The archive is training material for new responders. Real incidents teach better than runbooks; new on-calls who read recent incidents arrive at their first page with real pattern recognition instead of abstract guidance.
- New on-calls read archives. Recent-incidents reading list per onboarding. Concrete cases beat abstract guidance.
- Better than runbooks. Real-incident pattern recognition transfers; runbooks alone cannot teach what production failure looks like.
- Curated reading set. Named-incident list per team. New responders arrive prepared instead of cold.
- Quarterly archive review session. Team-wide pattern review surfaces systemic causes that single-incident postmortems cannot.