Runbook Quality: The 3am Test
A runbook is only as good as its 3am usability. The test is mechanical; the discipline is the cadence.
Why most runbooks fail at 3am
Runbooks written by experts; read by sleepy junior engineer at 3am.
Most fail because they assume context the reader does not have.
Four 3am properties
- 1. Copy-paste-ready commands.
- 2. No conditionals (or trivially flat).
- 3. Linked from the alert itself.
- 4. Owned by a team, with a quarterly review date.
Maintenance cadence
Quarterly: each runbook reviewed by the on-call team.
Drift caught in review; lessons fed back to runbook author.
Drift testing
CI test: walk runbook commands; verify they exist; verify links resolve.
Synthetic test in production: rehearse one runbook quarterly.
Antipatterns
- Runbooks in a wiki. Search broken at 3am.
- Runbook with placeholders. Risk of wrong substitution.
- No quarterly review. Drift wins.
What to do this week
Three moves. (1) Apply this practice to your next on-call rotation. (2) Survey the team after one cycle. (3) Iterate based on feedback; the discipline is the cadence.