The Strict Runbook-Attached Rule
Every alert has a runbook URL or it doesn't ship. Enforcement.
The strict rule
No alert ships to paging without a runbook URL. Not a wiki landing page; a specific runbook for this alert.
CI checks the URL is present and returns 200. PR fails if either check fails.
Stub runbooks ("investigate the issue") are rejected in review. The runbook must list the first 3 actions.
What the runbook contains
Confirmation: how to verify the alert is real (not a false positive). Specific commands or queries.
First actions: what to do in the first 5 minutes. Restart? Failover? Page someone else?
Escalation: when to page the next person, and who that is. Include their team and timezone.
Keeping runbooks current
Runbooks rot. Quarterly review by the owning team; update the runbook or retire the alert.
After every incident, update the runbook with what was actually done. The runbook is the cumulative knowledge of the team.
Block PR approvals on stale runbook reviews. If the runbook hasn't been touched in 6 months and the alert fired, the PR fails.
How to review a runbook
Could a new on-call execute it without asking questions. If not, it's not a runbook.
Are the commands current. Tools and APIs change; commands break silently.
Is the escalation path correct. Team names change, schedules change, people leave.
How to enforce
Linter on the alert config repo. Required field: runbook_url. CI runs `curl -fsS` against the URL.
Quarterly audit of all runbook URLs. Broken URLs file tickets to the owning team.
Make runbook quality part of the alert review. "Reviewer approved that the runbook is sufficient" is a checkbox.