The Runbook Grade: A Self-Assessment for Quality

Score your runbooks on a 1-5 scale across five dimensions. The runbook that scores below 3 is technical debt; the one that scores 5 is rare.

The five dimensions

Five dimensions score every runbook on a 1-5 scale. Together they reveal whether the runbook is a working tool or a comforting fiction.

Specificity. Vague vs concrete. "Restart the service" scores 1; kubectl rollout restart deployment/api -n prod scores 5.
Completeness. Covers symptom, diagnosis, action, verification, rollback; score 1 if any are missing; score 5 if all five are explicit.
Currency. When was it last updated; anything over 6 months is dated; over 12 months is suspect; over 24 months is fiction.
Author confidence plus test runs. Written by someone who has run it; used in the last quarter; honesty in both is the proof.

What scores mean

Each composite score maps to a clear posture. Below 3 is a liability; 3-4 is usable with experience; 5 is rare and earned.

5: production-grade. A senior on-call could follow it cold; few runbooks earn this; the gold standard.
3-4: usable. Workable with experience; junior on-call would need to ask questions; the realistic majority.
1-2: dangerous. Following blindly causes more incidents than it resolves; update or retire immediately.
The honesty test. Score must be earned, not claimed; test runs are the most honest measure of working.

What to do with low scores

Low-scoring runbooks need policy, not just patience. Without a removal or refactor path, low scores accumulate; the runbook library becomes the documentation that nobody trusts.

Score below 3. Cannot be referenced from a runbook link in a page; the on-call investigates from scratch instead.
Quarterly refactor list. Lowest 5 scores get a refactor; owners assigned; deadlines tracked.
Earned, not claimed. Score must be defended with evidence; test runs are the most honest measure.
The flywheel. Quarterly review keeps the library honest; engineers learn the standard; new runbooks aim higher.