On-Call Escalation Tree
Up the tree.
Overview
The on-call escalation tree defines who gets paged after the primary fails to acknowledge, in what order, with what timeout per tier. Without a tree, an unacknowledged page either dies in silence or fans out to everyone simultaneously. With one, the page walks deterministically from primary to secondary to manager to cross-team backup, each tier with a defined wait that balances responsiveness against unnecessary wakeups.
- Up the tree. Per-incident escalation path; primary, secondary, manager, cross-team. Each tier is named and reachable.
- Per-tier timeout. Per-tier wait before escalation; typically 5 minutes primary, 10 minutes secondary, 15 minutes manager. Tuned to operator response reality.
- Manager on-call. Per-team manager backup tier; provides authority for cross-team coordination during long incidents.
- Cross-team escalation plus committed tree. Per-incident cross-team escalation path documented; per-team tree committed to the runbook for onboarding.
The approach
The practical approach is explicit tier definitions, per-tier timeouts tuned to actual operator response patterns, manager-on-call as an explicit role (not "ask Slack"), cross-team escalation paths documented per service boundary, and the whole tree committed to the team handbook so incident response does not have to invent the path under pressure.
- Tier definition. Each tier is a named role with a named person on rotation; not "the team", a person.
- Timeout tuning. Tighter timeouts for sev1, looser for sev3; the tree shape matches the urgency.
- Manager on-call. Per-team manager rotation provides authority for cross-team asks during long incidents.
- Cross-team path plus documented tree. Per-service cross-team escalation contacts; per-team tree committed for onboarding.
Why this compounds
Escalation tree discipline compounds across incidents. Each clean escalation reduces time-to-coordination; each documented path survives team turnover; the team’s ability to run a long incident without confusion grows quarter over quarter. The opposite, where every long incident requires inventing the tree on the fly, never gets faster.
- Coordination. Right tree matches incident shape; the IC reaches the right person without negotiation.
- Resilience. Cross-team escalation supports cross-cutting incidents; the right authority arrives without waiting for working hours.
- Operational hygiene. Per-quarter tree review catches drift in roles, contacts, and timeouts before the next 3am page.
- Institutional knowledge. Each escalation teaches coordination patterns; the team builds a vocabulary for who to reach when.
Escalation tree discipline is an operational discipline that pays off across years. Nova AI Ops integrates with on-call telemetry, surfaces escalation patterns, and supports the team’s incident-response discipline.