Near-Miss Tracking
Near-misses teach without the cost of incidents. The tracking that learns from them.
Define
Near-miss tracking learns from events that almost became incidents. The definition has to be explicit so engineers know what to flag: an event that could have been a sev 2 but was stopped by luck or by a safety mechanism. No-blame framing is non-negotiable; without it, near-misses go unreported and the learning disappears.
- Almost-incident event. "Could have been a sev 2" framing. Sets the bar at production-impacting potential.
- Saved by luck or safety mechanism. The "what stopped it" question matters as much as what almost went wrong. Lucky saves are the most valuable to log.
- Documented definition per team. Explicit near-miss criteria in writing. Catches inconsistent reporting across the team.
- No-blame framing. Safety-first reporting culture. Supports honest disclosure; without it, near-misses stop reaching the log.
Log
The log is lighter than a postmortem (15-minute write-up rather than 90). Three fields carry most of the learning: what almost went wrong, what saved it, what would prevent recurrence. More learning per unit of engineer time than a full postmortem produces.
- What almost went wrong. Documented near-failure path. Captured while memory is fresh.
- What saved it. Explicit "what stopped it" answer. Reveals hidden safety mechanisms the team did not know it had.
- What would prevent recurrence. Named control. Drives the action item that follows.
- 15-minute write-up. Lighter than a postmortem by design. More learning per unit time than full postmortems.
Act
Near-misses are useful only if the team acts on them. Many reveal hidden risks the alerting did not surface; small targeted action items accumulate into real reliability gains over quarters. Quarterly pattern review catches recurring themes that any single near-miss would miss.
- Investigate every near-miss. Explicit follow-up per entry. Many reveal hidden risks.
- Small cumulative action items. Targeted fix per near-miss. Adds up over quarters of disciplined logging.
- Quarterly near-miss review. Pattern-recognition session. Catches recurring themes individual entries hide.
- Named owner per entry. Responsible engineer for the fix. Catches "noted but never fixed" entries before they outnumber resolved ones.