Incident Retrospectives That Actually Change Behaviour
Most retrospectives produce a list of action items that nobody owns and nobody ships. The format and rules that turn the meeting from theatre into engineering input.
When retros become theatre
A retro that produces 12 action items, none of which has a name and a due date attached, is a feel-good ritual. The team feels like progress was made; the next incident reveals it was not. The fix is structural.
The pathology has a recognisable shape. The retro happens; it runs 90 minutes; everyone leaves feeling like they processed the incident. Two weeks later, nobody can name the action items. Two months later, a similar incident happens. The retro happens again, identifying many of the same contributing factors. The team feels productive in each meeting and is, in aggregate, learning nothing.
The structural fix is to treat the retro as a producer of artifacts, not as therapy. Two artifacts: a postmortem document (durable record) and a list of named action items with due dates (commitment to change). If a retro produces neither, it was theatre. If it produces both, even an awkward retro creates change.
The four sections
- Timeline: facts only, no interpretation. 10 minutes.
- What worked: what about the response is worth keeping. 10 minutes.
- Contributing factors: not "root cause", three to seven things that contributed. 20 minutes.
- Action items: with owner + size + due. 15 minutes.
Total 55 minutes. Past 60 the energy collapses; the IC ends the meeting on time. Most teams find their first few retros run over because they haven't internalised the time-boxing yet; with practice, the four sections fit.
The order matters. Timeline first (everyone aligns on facts before opinions). What-worked second (preserves what's good, builds team confidence). Contributing-factors third (the analytical work, when energy is highest). Action items last (commitments, when the team has the most context). Reordering reliably degrades the meeting.
Rules of engagement
No "if only Sara had..." statements. Replace person language with system language. The IC interrupts when the conversation drifts to blame. The interruption is what enforces the rule; without it, well-intentioned engineers will drift to blame because that's the natural mode of "what went wrong" thinking.
Other rules. (1) The most senior person speaks last. Senior engineers speaking first anchors everyone else; speaking last lets junior engineers contribute their actual observations. (2) The retro is recorded but not shared widely. The recording is for the IC's reference; the public artifact is the postmortem document. (3) Customer impact is described as the customer experienced it, not as the team experienced it. "Customers couldn't log in for 47 minutes" beats "the auth service had elevated error rates."
The hardest rule to enforce: the no-blame rule when senior engineers blame themselves. "I should have caught this in code review" sounds like accountability and is actually self-blame. The system question is "what about the code review process let this through?" Self-blame in a retro is comfortable for the speaker and corrosive for the team's learning.
Single owner per item
Every action item has exactly one name. Two names = no name. The owner doesn't have to do all the work, but they own the outcome and post the status. Without single ownership, action items die in the comfortable diffusion of "we should all..."
The owner-versus-assignee distinction matters. An assignee is the person who will do the work. An owner is the person responsible for the outcome, they may delegate the work, recruit help, or escalate when blocked. Most retros pick an "assignee" by default; the right move is to pick an owner. Owners drive completion; assignees do tasks.
The owner is usually present at the retro. Avoid assigning action items to people who weren't there, they didn't agree to the action item, didn't hear the context, and won't have the same conviction as the people in the room. If an action item belongs to someone not present, the retro defers committing it until the right person can confirm.
Sizing the action item
Action items get a t-shirt size: S (one day), M (one week), L (one sprint). Anything XL belongs in a roadmap proposal, not in this retro. The size constraint is what keeps action items shippable.
The reason for the constraint: large action items don't ship. "Re-architect the alerting platform" is the right thing to do AND it won't ship from a retro action-items list. The retro produces small, focused improvements that compound; large initiatives belong in a different planning surface.
How to right-size. Take the proposed action item; if it can't be done in a sprint, decompose it. "Better runbook quality" → "review runbooks for the auth service this sprint, identify the 3 worst, fix one." The sprint-shaped chunk has clear completion criteria; the year-shaped chunk doesn't.
The discipline pays off in the 30-day follow-up. Most teams find that 70-80% of small action items ship; only 20-30% of large ones do. Right-sizing is what makes the action-items list a real driver of change rather than a wishlist.
The contributing-factors frame
Most postmortems use "root cause" and find one. The frame is wrong; modern systems break in too many directions for a single chain to hold. Use "contributing factors" instead, list 3-7 things that contributed to the incident, no ranking, no winner.
The benefit. Each contributing factor becomes a candidate for an action item. A team that finds 5 contributing factors has 5 candidate improvements; a team that finds "the root cause" has 1. The mental model of multiple contributing factors produces more granular and shippable improvements.
The discipline. List factors before debating them. Each engineer writes their factors silently for 5 minutes; the IC then reads them all aloud and consolidates. The silent-write step prevents the loudest engineer's theory from anchoring the discussion. Consolidation produces a list of distinct factors, duplicates are obvious; spurious ones don't survive the read-aloud.
30-day follow-up
30 days after every retro, the IC does a one-paragraph status check on every action item. Closed: cite the PR. In progress: cite where it is. Stalled: explain why. Without the 30-day check, the meeting was theatre.
The 30-day check is also where the action-item-completion rate is measured. Healthy teams complete 70-80% of action items within 30 days; below 50% indicates the action items were too large or owners weren't actually committed. The metric is a calibration signal on the team's retro discipline, not a performance review item for individuals.
What to do with stalled action items. Discuss in the next retro: was the item the wrong action, or the wrong size, or owned by the wrong person? Sometimes the right move is to drop it (the situation has evolved); sometimes the right move is to escalate (it's structurally important but blocked); sometimes the right move is to re-decompose (it was XL even if it looked L). The check creates the moment to decide.
Building the retro culture
The first 3-5 retros after introducing this format will feel awkward. Engineers don't know how to use the timeline section, drift to blame in contributing factors, and over-commit on action items. That's normal. Retros become smooth around the 6-month mark when the team has internalised the structure.
Sponsorship matters. The engineering manager being present and modeling the rules matters more than any document. The first time the EM says "let's reframe that, what about the system made this easy?" rather than letting blame language stand, the team learns the rule by example. Process documents teach format; the EM teaches culture.
What to NOT do. Don't make retro attendance optional; don't skip retros for "small" incidents (the small ones often reveal the most about systems); don't merge retros for multiple incidents (each incident's lessons get diluted); don't run retros without a designated facilitator (the meeting drifts).
What to do this week
Three moves. (1) After your next incident, run the retro using these four sections explicitly, call out which section you're in. The labels alone change the conversation. (2) Adopt the contributing-factors frame; ban "root cause" from retros for one quarter as an experiment. (3) Schedule the 30-day check on every retro's action items. The recurring calendar event is the simplest mechanism; the 5 minutes per check produce more change than the 60 minutes of the retro itself.