The Strict Runbook-Attached Rule

Every alert has a runbook URL or it doesn't ship. Enforcement.

The strict rule

The rule is uncompromising. No alert ships to paging without a runbook URL (not a wiki landing page; a specific runbook for this alert); CI checks the URL is present and returns 200, PR fails if either check fails; stub runbooks (“investigate the issue”) are rejected in review because the runbook must list the first 3 actions.

Specific runbook URL required. Not a wiki landing page; a specific runbook for this alert.
CI presence and 200 check. URL present and returns 200; PR fails if either check fails.
Stub runbooks rejected. “Investigate the issue” not enough; must list first 3 actions.
Per-alert runbook ownership. The runbook has an owner; supports the maintenance discipline.

What the runbook contains

The runbook has three required sections. Confirmation (how to verify the alert is real, not a false positive, with specific commands or queries); first actions (what to do in the first 5 minutes: restart, failover, page someone else); escalation (when to page the next person and who they are, with team and timezone).

Confirmation. How to verify the alert is real, not a false positive; specific commands or queries.
First actions. First 5 minutes; restart, failover, page someone else.
Escalation. When to page the next person, who they are; include their team and timezone.
Per-section length cap. Each section has a length cap to keep the runbook usable; supports speed at incident time.

Keeping runbooks current

Runbooks rot, so the maintenance loop must close. Quarterly review by the owning team (update the runbook or retire the alert); after every incident, update the runbook with what was actually done because the runbook is the cumulative knowledge of the team; block PR approvals on stale runbook reviews when alerts fire.

Quarterly review. Owning team updates the runbook or retires the alert.
Post-incident update. What was actually done lands in the runbook; cumulative knowledge.
Stale runbook blocks PR. If runbook hasn’t been touched in 6 months and the alert fired, PR fails.
Per-runbook freshness check. CI surfaces runbooks past their review window; supports the cadence.

How to review a runbook

The review checklist is short and concrete. Could a new on-call execute it without asking questions (if not, it’s not a runbook); are the commands current (tools and APIs change, commands break silently); is the escalation path correct (team names change, schedules change, people leave).

New-on-call test. Could a new on-call execute it without asking questions; if not, it’s not a runbook.
Commands current. Tools and APIs change; commands break silently; verify they still work.
Escalation correct. Team names change, schedules change, people leave; the escalation must follow.
Per-review documented decision. The review outcome (pass, update, retire) committed; supports auditability.

How to enforce

Enforcement makes the rule real. Linter on the alert config repo with required runbook_url field and CI running curl -fsS against the URL; quarterly audit of all runbook URLs that files tickets for broken URLs; runbook quality as part of the alert review with “Reviewer approved that the runbook is sufficient” as a checkbox.

Linter required field. runbook_url required; CI runs curl -fsS against the URL.
Quarterly URL audit. Broken URLs file tickets to the owning team; the audit closes the loop.
Reviewer checkbox. “Reviewer approved that the runbook is sufficient”; the human review remains.
Per-org enforcement policy. The discipline documented; supports onboarding new teams.