Multi-Team Postmortem
Coordination.
Setup and attendees
Multi-team postmortems get both sides in the room with the right framing. Setup decides whether the meeting is useful: lead plus on-call from each side, neutral facilitator, pre-read circulated 24 hours ahead, time-box on the conversation.
- Both team leads plus on-call engineers. Senior enough to commit to action items without going back for approval; without authority, the meeting becomes a status report.
- Neutral facilitator. Platform team or senior SRE facilitator per meeting; reduces team-versus-team dynamics, keeps the conversation system-focused.
- Pre-read circulated. 24-hour pre-read per meeting with timeline, impact summary, contributing factors; the meeting is for analysis, not catching up.
- Explicit time-box per meeting. 60-90 minute window keeps focus on action items rather than rehashing the incident in detail.
Framing the conversation
Framing decides whether the meeting stays blameless or devolves into team-versus-team finger-pointing. The facilitator enforces; the framing is the load-bearing piece.
- Combined-systems question. "How did our two systems combine to cause this?" framing per meeting; not "whose fault was it."
- Own-team responsibility. Each team owns its own contributing factors per team; the other team listens rather than assigning blame.
- Interaction as a factor. Integration-as-contributing-factor framing per meeting; often neither team alone caused the incident, the integration did.
- Visible language guide per meeting. Blameless rules visible on the wall or shared screen catch old habits before they derail the meeting.
Walking the timeline
The timeline walk is where insights land. Single interleaved timeline with both teams' observations beats two parallel narratives because the integration is what failed.
- Single interleaved timeline. One timeline with both teams' actions per meeting; parallel narratives hide the moment the systems diverged.
- Per-moment observations. Both-team view per event: what did Team A see, what did Team B see, were they aware of each other.
- Locally correct, globally unexpected. Common pattern per incident; the action that made sense locally produced an outcome neither team predicted globally.
- Timestamped log link per event. Source-of-truth log per event supports independent verification when memory and recall diverge.
Action items
Action items are the meeting's output. Per-team items, joint items with shared ownership, and process items that change how the teams work together.
- Per-team action items. System or process specifics per team; owner, deadline, tracked the same way as single-team postmortem actions.
- Joint action items. Cross-team changes per meeting with joint ownership and a designated lead; most failure-prone item type, track religiously.
- Process action items. Shared on-call channel, joint runbook, regular sync per meeting; often the highest-leverage items.
- Visible status per action. Tracked ticket per action catches stalled fixes before the next cross-team incident.
Pitfalls
The pitfalls are predictable: skipping the synchronous meeting in favour of async, letting one team dominate the conversation, and accepting vague action items as deliverables.
- Async-only postmortems. Synchronous requirement per incident; tone gets lost in writing, tensions escalate when teams write at each other.
- One-team-dominated meeting. Equal speaking time per meeting; facilitator enforces, both teams' perspectives matter equally regardless of seniority gap.
- Vague action items. Specific test per action; "Joint runbook by date X" is specific, "improve communication" is not.
- Quarterly cross-team retro. Followup retrospective per quarter catches recurring patterns that single-incident postmortems miss.