The Strict Runbook-Attached Rule
Every alert has a runbook URL or it doesn't ship. Enforcement.
The strict rule
The rule is uncompromising. No alert ships to paging without a runbook URL (not a wiki landing page; a specific runbook for this alert); CI checks the URL is present and returns 200, PR fails if either check fails; stub runbooks (“investigate the issue”) are rejected in review because the runbook must list the first 3 actions.
- Specific runbook URL required. Not a wiki landing page; a specific runbook for this alert.
- CI presence and 200 check. URL present and returns 200; PR fails if either check fails.
- Stub runbooks rejected. “Investigate the issue” not enough; must list first 3 actions.
- Per-alert runbook ownership. The runbook has an owner; supports the maintenance discipline.
What the runbook contains
The runbook has three required sections. Confirmation (how to verify the alert is real, not a false positive, with specific commands or queries); first actions (what to do in the first 5 minutes: restart, failover, page someone else); escalation (when to page the next person and who they are, with team and timezone).
- Confirmation. How to verify the alert is real, not a false positive; specific commands or queries.
- First actions. First 5 minutes; restart, failover, page someone else.
- Escalation. When to page the next person, who they are; include their team and timezone.
- Per-section length cap. Each section has a length cap to keep the runbook usable; supports speed at incident time.
Keeping runbooks current
Runbooks rot, so the maintenance loop must close. Quarterly review by the owning team (update the runbook or retire the alert); after every incident, update the runbook with what was actually done because the runbook is the cumulative knowledge of the team; block PR approvals on stale runbook reviews when alerts fire.
- Quarterly review. Owning team updates the runbook or retires the alert.
- Post-incident update. What was actually done lands in the runbook; cumulative knowledge.
- Stale runbook blocks PR. If runbook hasn’t been touched in 6 months and the alert fired, PR fails.
- Per-runbook freshness check. CI surfaces runbooks past their review window; supports the cadence.
How to review a runbook
The review checklist is short and concrete. Could a new on-call execute it without asking questions (if not, it’s not a runbook); are the commands current (tools and APIs change, commands break silently); is the escalation path correct (team names change, schedules change, people leave).
- New-on-call test. Could a new on-call execute it without asking questions; if not, it’s not a runbook.
- Commands current. Tools and APIs change; commands break silently; verify they still work.
- Escalation correct. Team names change, schedules change, people leave; the escalation must follow.
- Per-review documented decision. The review outcome (pass, update, retire) committed; supports auditability.
How to enforce
Enforcement makes the rule real. Linter on the alert config repo with required runbook_url field and CI running curl -fsS against the URL; quarterly audit of all runbook URLs that files tickets for broken URLs; runbook quality as part of the alert review with “Reviewer approved that the runbook is sufficient” as a checkbox.
- Linter required field.
runbook_urlrequired; CI runscurl -fsSagainst the URL. - Quarterly URL audit. Broken URLs file tickets to the owning team; the audit closes the loop.
- Reviewer checkbox. “Reviewer approved that the runbook is sufficient”; the human review remains.
- Per-org enforcement policy. The discipline documented; supports onboarding new teams.