Postmortem for Near-Misses

Learning without cost.

Overview

Near-miss postmortems run on incidents that almost happened but did not. A bug caught by a canary, a config change reverted before it propagated, a deploy gate that fired before customer impact. Aviation has run safety culture this way for decades because waiting for real outages is the slowest possible learning loop. Software organisations that adopt the practice get the same benefit.

The approach

Three habits make near-miss postmortems work in software: encourage reporting actively, run a lighter postmortem template than for full incidents, and prioritise action items by counterfactual impact rather than actual.

Why this compounds

Each near-miss postmortem deposits learning at the cost of writing it. The proactive incident-reduction loop runs ahead of customer impact; recurring failure modes get caught before they become incidents.