The Three-Page Rule for the On-Call Mental Model
The mental model that fits on one page. Three pages, one each for: live state, recent deploys, escalation paths. Why three is the right number.
Page 1: live state
Page one is the current state of every service the on-call owns. The whole picture in one screen, refreshed automatically; the on-call should never be the source of staleness.
- One row per service. Latency, error rate, saturation, recent alerts; status icon for at-a-glance health.
- Auto-updated. The page refreshes itself; the on-call never maintains it; any time-decay risk means stale and untrustworthy.
- Single screen. Surfaces incidents in flight without scrolling; can be checked during the handoff call.
- Anchored to ownership. Only services this on-call rotation owns; cross-team noise belongs on someone else’s page one.
Page 2: recent deploys
Page two is every deploy in the last 4 hours: service, version, one-line description, sorted reverse-chronological. When an alert fires, page two is the first place to look because "what changed" is the most common cause of new incidents.
- Last 4 hours only. The default surface is recent; older deploys fall off; dig deeper only if the recent set does not explain.
- Service plus version plus description. Three columns; everything else is noise during an incident.
- Reverse chronological. The most recent deploy is the most likely cause; ordering matters for fast triage.
- Linked to deploy logs. One click from the row to the build log, the PR, and the rollback button; remove the friction.
Page 3: escalation paths
Page three is per-service escalation: who to page for what kind of failure. Database team for DB issues; network team for connectivity; security for credential leaks. Single source of truth, because if escalation paths live in four places, mistakes happen during incidents.
- Per-service primary contact. The team or person to page first for a given service or failure class.
- Secondary on-call plus escalation manager. Two layers of fallback; documented, not assumed.
- "Do not disturb" lists. Holiday, off-call, sabbatical; honoured during incident pages.
- Single source of truth. One canonical page; not a wiki entry, a Slack pin, and a Google doc all drifting separately.