Postmortem for Near-Misses
Learning without cost.
Overview
Near-miss postmortems run on incidents that almost happened but did not. A bug caught by a canary, a config change reverted before it propagated, a deploy gate that fired before customer impact. Aviation has run safety culture this way for decades because waiting for real outages is the slowest possible learning loop. Software organisations that adopt the practice get the same benefit.
- Learning without customer cost. The lesson arrives without the outage. The cheapest possible postmortem.
- What caught it. Per-near-miss the catching mechanism. The control that worked deserves credit and reinforcement.
- What could have happened. Counterfactual analysis. Investment scaled to the worst plausible outcome, not the actual one.
- Safe reporting plus aviation-style culture. Reporting near-misses must be safe; punishing reporters destroys the practice. Aviation rewards the report.
The approach
Three habits make near-miss postmortems work in software: encourage reporting actively, run a lighter postmortem template than for full incidents, and prioritise action items by counterfactual impact rather than actual.
- Encourage reporting. Make the channel obvious; thank reporters publicly. Without psychological safety, near-misses go unreported.
- Lightweight postmortem. Simpler template than full-incident. The bar to write must be low or near-misses get skipped.
- Action items by counterfactual impact. Prioritise by what could have happened, not what did. The system that almost broke gets fixed.
- Aviation-style culture plus documented policy. Reporting is rewarded, not punished; per-team the near-miss policy lives in the engineering handbook.
Why this compounds
Each near-miss postmortem deposits learning at the cost of writing it. The proactive incident-reduction loop runs ahead of customer impact; recurring failure modes get caught before they become incidents.
- Proactive incident reduction. Near-miss postmortems prevent real incidents that would otherwise have happened.
- Culture reinforced. Safe reporting supports team trust and retention.
- Learning compounds without cost. Each near-miss teaches the team something for free.
- Year-one investment, year-two habit. First near-miss postmortem feels strange. By the tenth, the team treats them as the cheapest learning available.