Alerts Practical By Samson Tanimawo, PhD Published May 1, 2026 4 min read

The Strict Runbook-Attached Rule

Every alert has a runbook URL or it doesn't ship. Enforcement.

The strict rule

No alert ships to paging without a runbook URL. Not a wiki landing page; a specific runbook for this alert.

CI checks the URL is present and returns 200. PR fails if either check fails.

Stub runbooks ("investigate the issue") are rejected in review. The runbook must list the first 3 actions.

What the runbook contains

Confirmation: how to verify the alert is real (not a false positive). Specific commands or queries.

First actions: what to do in the first 5 minutes. Restart? Failover? Page someone else?

Escalation: when to page the next person, and who that is. Include their team and timezone.

Keeping runbooks current

Runbooks rot. Quarterly review by the owning team; update the runbook or retire the alert.

After every incident, update the runbook with what was actually done. The runbook is the cumulative knowledge of the team.

Block PR approvals on stale runbook reviews. If the runbook hasn't been touched in 6 months and the alert fired, the PR fails.

How to review a runbook

Could a new on-call execute it without asking questions. If not, it's not a runbook.

Are the commands current. Tools and APIs change; commands break silently.

Is the escalation path correct. Team names change, schedules change, people leave.

How to enforce

Linter on the alert config repo. Required field: runbook_url. CI runs `curl -fsS` against the URL.

Quarterly audit of all runbook URLs. Broken URLs file tickets to the owning team.

Make runbook quality part of the alert review. "Reviewer approved that the runbook is sufficient" is a checkbox.