The Deploy Postmortem Agent: First Pass at the Writeup
A postmortem is a writing task and a forensics task. The agent that handles the forensics and produces a writeup that is 70% finished, leaving the analysis to humans.
Forensics: what happened
Pull the timeline: alerts that fired, services affected, deploys made, actions taken.
Pull the metrics: errors, latency, traffic. The agent extracts the time window of impact and characterises the drop.
Pull the deploy logs: what was deployed, by whom, with what changes. The agent links deploys to the impact window.
Draft the writeup
Section 1: summary (what happened, when, who was affected, total duration). The agent gets this 90% right because it is mechanical.
Section 2: timeline (the what-happened-when, with action attribution). Mechanical; agent is reliable.
Section 3: contributing factors (what allowed the incident to happen). The agent proposes; humans refine.
Section 4: action items (what to do to prevent recurrence). The agent proposes; humans decide ownership and priority.
70% finished
The agent's draft handles the rote sections (summary, timeline). The human owns the analytical sections (contributing factors, action items).
70% finished means the human starts from a draft, not a blank page. Time-to-postmortem drops from 4 hours to 90 minutes.
Resist drafting the analytical sections. The agent will produce plausible-looking text that misses the actual contributing factors. Save those for humans.
Human review checklist
Verify the timeline: are all major events represented? Did the agent miss something?
Verify the impact: does the duration and scope match what the team experienced?
Refine contributing factors: the agent suggested some; check whether they are right and what is missing.
Set action items: the agent's suggestions are starting points; humans pick the ones that matter and assign owners.
What the agent learns over time
Postmortem templates: each team has its preferences. The agent learns which sections to include and which to skip.
Common contributing factors: "deploy without canary" or "monitoring gap" appear repeatedly. The agent's prompt is updated to surface these.
Action item patterns: the team's culture about ownership, deadlines, follow-up. The agent's drafts adapt over months.