From Runbook to Agent: A Translation Pattern
Most SRE runbooks already encode an agent. The five-step pattern that turns a Confluence page into a deployable agent, with the parts you should keep and the parts you should drop.
Audit the runbook before you translate
Most runbooks have three layers tangled together: detection ("if you see X"), reasoning ("this usually means Y"), and action ("run Z"). Translation works only when you separate them. Read the runbook with three highlighters; mark each layer.
If the detection layer is missing, the runbook is reactive: it kicks in only after a human noticed. An agent needs an explicit detection signal, an alert, a metric threshold, an event subscription. Add the detection layer if it is implicit.
If the reasoning layer is just "check the dashboard," the runbook is under-specified for an agent. Either spell out the reasoning explicitly, or accept that this runbook stays human-driven and pick a different one.
Map the runbook actions to tool calls
Each numbered step in the action layer becomes a tool the agent can call. "Restart the connection pool" is one tool. "Roll back the last deploy" is another. The mapping is mechanical: list the steps, list the tools, draw lines.
Steps that need a human judgement ("decide whether to page database team") become decision points the agent surfaces, not actions it takes. The agent's job is to gather evidence and present it; the human's job is to decide.
Steps that include irreversible commands (drop, delete, terminate) get an explicit two-person approval gate. The agent can propose; only a second pair of eyes signs off. Ship the gate before the action.
Write the system prompt from the reasoning layer
The reasoning layer of the runbook becomes the spine of the system prompt. "This usually means Y" turns into "hypotheses to evaluate," with each hypothesis paired with the evidence the agent should gather to confirm or rule it out.
Keep the prompt concrete. Include the exact metrics, the exact log patterns, the exact thresholds. Vague prompts produce vague reasoning. "High latency" is vague; "p99 > 200ms sustained for >5 minutes" is concrete.
Add a refusal clause for cases the runbook does not cover. "If the symptoms do not match any hypothesis above, say so and hand off to the human on-call." Refusal is a feature, not a failure mode.
Build the eval suite from past incidents
Pull the last ten incidents that this runbook applied to. Each becomes a test case: the input is the alert payload and the metrics window; the expected output is what the human on-call concluded.
Replay the cases against the agent. Score by: did it surface the right hypothesis, did it surface the right evidence, did it propose the right action. Three scores per case. Track them across versions.
Cases the agent fails are signal. Either the prompt is missing context, the tools are missing data, or the agent is genuinely overreaching. Each failure mode has a different fix; diagnose before patching.
Retire what the agent now handles
The runbook itself can be shortened once the agent handles the rote parts. Leave only the parts a human still does: the judgement calls, the cross-team coordination, the cases the agent escalates.
Update the runbook to point at the agent. "For p99 latency on service X, invoke the triage agent first; it returns a hypothesis in 6 seconds." The human's first move is now reading the agent's output, not running the runbook.
Track which runbooks have been translated. When two-thirds of a service's runbooks are agent-handled, the on-call burden has materially shifted. That is the milestone worth celebrating.
What to do this week
Pick a 20-line runbook that gets used at least weekly. Spend an afternoon on the translation. Replay last month's incidents against the agent; if it would have produced the right hypothesis on at least 7 of 10, ship it as read-only. If not, the runbook itself probably needs revision before the agent can succeed.