The Deploy Postmortem Agent: First Pass at the Writeup

A postmortem is a writing task and a forensics task. The agent that handles the forensics and produces a writeup that is 70% finished, leaving the analysis to humans.

Forensics: what happened

The forensics step is mechanical. The agent pulls the timeline, the metrics, and the deploy logs, then stitches the three views into a single ordered narrative.

Timeline pull. Alerts that fired, services affected, deploys made, actions taken. One ordered list with timestamps and actors.
Metrics extract. Errors, latency, traffic. The agent extracts the time window of impact and characterises the drop with concrete numbers.
Deploy log link. What was deployed, by whom, with what changes. The agent links deploys to the impact window so the “did a deploy do this” question is answered up front.
Citation requirement. Every claim in the forensics section cites the underlying log line or metric query. The reviewer can re-trace the agent’s steps.

Draft the writeup

The draft splits into mechanical sections the agent owns and analytical sections the human owns. Mixing the two produces postmortems that read smooth and reason poorly.

Section 1: summary. What happened, when, who was affected, total duration. The agent gets this 90 percent right because it is mechanical.
Section 2: timeline. The what-happened-when, with action attribution. Mechanical; agent is reliable.
Section 3: contributing factors. What allowed the incident to happen. The agent proposes; humans refine.
Section 4: action items. What to do to prevent recurrence. The agent proposes; humans decide ownership and priority.

70% finished

The shipped artefact is a 70-percent draft, not a finished document. The number is calibrated so the human review still owns the analysis but starts from real text rather than a blank page.

Rote sections owned. The draft handles summary and timeline. The human owns contributing factors and action items.
Time savings. Time-to-postmortem drops from roughly 4 hours to 90 minutes. The compounding across many incidents per quarter is large.
Resist analytical drafts. The agent will produce plausible-looking text that misses the real contributing factors. Save those for humans.
Visible draft state. The artefact is labelled “agent draft” until human review marks it final. Reviewers should not assume polish equals correctness.

Human review checklist

The review checklist is short and explicit. Each item names what the human is looking for so review stays consistent across teams.

Timeline completeness. Are all major events represented? Did the agent miss anything that mattered?
Impact accuracy. Does the duration and scope match what the team experienced? Customer impact wording in particular.
Contributing factors refinement. The agent suggested some. Check whether they are right and what is missing.
Action item selection. Agent suggestions are starting points; humans pick the ones that matter and assign owners with deadlines.

What the agent learns over time

The agent gets better at this work with feedback. The three signals below feed prompt updates and keep drafts aligned with team conventions.

Template preferences. Each team has its own postmortem shape. The agent learns which sections to include and which to skip per team.
Common contributing factors. “Deploy without canary” or “monitoring gap” recur. The agent’s prompt is updated to surface these explicitly.
Action-item patterns. The team’s culture around ownership, deadlines, and follow-up shows up in the action items section. The drafts adapt over months.
Reviewer feedback loop. Edits on prior drafts feed an eval set; the next prompt change is measured against it before shipping.