Escalation Policy Design: Three-Tier Pattern
Escalation policies have to be predictable and unambiguous. Three tiers are the minimum that handles real situations.
Why escalation matters
Single-tier paging fails the moment the on-call's phone is silenced or the engineer is asleep. Escalation is the safety net that turns a missed page into a delayed page rather than an outage.
- Without escalation. A missed primary page is the outage; nobody else hears about it until customers do.
- With escalation. Missed pages reach a human within minutes; the system fails forward, not silent.
- Trust failure. Phones die, batteries fail, alarms get muted; the policy assumes none of these are reliable.
- Audit proof. Compliance frameworks expect documented escalation; the trail is part of the evidence.
Three-tier pattern
- Tier 1: primary on-call. Pages immediately.
- Tier 2: secondary. Pages after N minutes no-ack.
- Tier 3: manager / broader team. Pages after secondary no-ack.
Per-team timing
One escalation cadence for every severity wastes higher tiers on minor issues. Per-severity timing keeps the policy useful and the secondary on-call sustainable.
- SEV1. T1 immediate, T2 at +5 minutes no-ack, T3 manager at +15 minutes.
- SEV2. T1 immediate, T2 at +15 minutes, T3 at +60 minutes; lower urgency, longer windows.
- SEV3 / SEV4. Single tier, business-hours only; no off-hours escalation.
- Severity drives cadence. One policy for all severities loses signal and overuses higher tiers.
No-ack defaults
The escalation must fire without human cooperation. At 3am, engineering judgement is unreliable; the policy compensates by escalating automatically.
- Auto-escalate. Page escalates after the no-ack window without anyone clicking anything.
- Engineer judgement removed. At 3am, the on-call should not have to think to escalate; the system does it.
- Manual override. On-call can suppress escalation if they have it under control; default is auto, override is explicit.
- Test quarterly. Synthetic page in the staging policy verifies escalation actually fires; do not assume.
Antipatterns
- One-tier escalation. Lost pages.
- Manual escalation. Forgotten in the moment.
- Same timing for all severities. Wastes higher tiers on minor.
What to do this week
Three moves. (1) Apply this practice to your next on-call rotation. (2) Survey the team after one cycle. (3) Iterate based on feedback; the discipline is the cadence.