The Three-Page Rule for the On-Call Mental Model
The mental model that fits on one page. Three pages, one each for: live state, recent deploys, escalation paths. Why three is the right number.
Page 1: live state
The current state of every service the on-call owns. Latency, error rate, recent alerts. One row per service; status icon.
Updated automatically; the on-call does not maintain it. Any time-decay risk means the page is stale and untrustworthy.
Surfaces incidents in flight without scrolling. Single screen; can be checked while talking to the previous on-call.
Page 2: recent deploys
Every deploy in the last 4 hours, with service, version, and one-line description. Sorted reverse-chronological.
When an alert fires, page 2 is the first place to look. 'What changed' is the most common cause of new incidents.
Deploys older than 4 hours fall off; the operator can dig deeper if needed but the default surface is recent.
Page 3: escalation paths
Per-service: who to page for what kind of failure. Database team for DB issues; network team for connectivity; etc.
Includes secondary on-calls, escalation managers, and 'do not disturb' lists.
Single source of truth. If escalation paths live in 4 places, mistakes happen during incidents. Consolidate.