Blameless PM Template
Structured.
Standard sections
The standard sections are the discipline of saying the same things in the same order across every incident. Pattern recognition compounds when every postmortem hits the same beats; deviations stand out as signal rather than noise.
- Summary. Two or three sentences naming what happened, when, and who was affected. Readable in one breath.
- Timeline. Timestamped event log covering detection, investigation, and resolution. Times come from chat logs and paging tools, not memory.
- Customer impact. Quantified blast radius. Number of customers, request volume, duration, and revenue impact if measurable.
- Contributing factors, resolution, lessons, action items. Multi-factor list rather than single root cause. Resolution explains what fixed it and whether the fix was intentional or accidental. Lessons capture new knowledge. Action items must be specific, owned, and deadline-bounded.
Blameless language patterns
Blameless language is its own discipline. Specific phrasings keep the focus on systems rather than individuals; the wrong verb in one sentence quietly turns a postmortem into a performance review.
- Focus on systems. "The deploy pipeline allowed an unreviewed change" rather than "Alice merged without review". The actor is the system the human worked through.
- Counterfactual reasoning. Ask "what conditions would have prevented this?" rather than "why did this person fail?". The first question generates fixes; the second generates defensiveness.
- Avoid loaded verbs. No "should-have" and no "failed-to". Both imply blame even when targeted at processes; rewrite as "the runbook did not cover this case" or similar.
- Published language guide. A short team document with example rewrites. New on-call engineers learn the patterns without trial and error.
Sections to drop or modify
Some traditional sections actively work against blameless review. Dropping them, or rewriting them with explicit guardrails, prevents the postmortem template itself from carrying blame forward.
- Single root cause. Drop the singular framing. Most incidents have multiple contributing factors, and naming one as "the cause" hides the others that need fixing.
- What went well. Keep only when it is specific about a process or system that worked. Generic praise slips into hagiography and dilutes the action items.
- Individual owner attribution. Do not credit or blame individuals for the incident itself. Owner names belong on action items, not on cause attribution.
- Explicit drop list in the template. The template names the sections that are intentionally absent. New authors do not reintroduce them by accident.
Sections worth adding
Some less-common sections add real value. Lucky-on, open questions, related incidents, and rollback timing surface signal that the standard template misses.
- What we got lucky on. Near-miss bullets. Surfaces hidden risks where a small detail prevented a worse outcome and that detail is not guaranteed next time.
- Open questions. Items the postmortem could not resolve. Documenting them ensures follow-up rather than letting them disappear into chat history.
- Related incidents. Pattern reference. The third similar incident is a systemic issue, not three independent bugs.
- Rollback timing. Actual versus ideal rollback time. Drives shorter MTTR by making the gap visible and actionable.
Review and publishing
Review and publishing close the loop. Without follow-through, postmortems are theatre; the discipline is what turns the document into operational change.
- Internal review. IC, on-call, and service-owner read the draft before it ships. Refines content, catches inaccuracies, and surfaces missed factors.
- Distribution. Engineering-forum share plus monthly engineering review. Visibility is what creates the cross-team learning postmortems are meant to enable.
- Action item tracking. Tied to the team's ticketing system. Items remain visible until shipped, not buried in the postmortem doc.
- Quarterly action-item retro. Open-action review every quarter. Catches stalled fixes before they become the contributing factor in the next incident.