On-Call Escalation Tree

Up the tree.

Overview

The on-call escalation tree defines who gets paged after the primary fails to acknowledge, in what order, with what timeout per tier. Without a tree, an unacknowledged page either dies in silence or fans out to everyone simultaneously. With one, the page walks deterministically from primary to secondary to manager to cross-team backup, each tier with a defined wait that balances responsiveness against unnecessary wakeups.

The approach

The practical approach is explicit tier definitions, per-tier timeouts tuned to actual operator response patterns, manager-on-call as an explicit role (not "ask Slack"), cross-team escalation paths documented per service boundary, and the whole tree committed to the team handbook so incident response does not have to invent the path under pressure.

Why this compounds

Escalation tree discipline compounds across incidents. Each clean escalation reduces time-to-coordination; each documented path survives team turnover; the team’s ability to run a long incident without confusion grows quarter over quarter. The opposite, where every long incident requires inventing the tree on the fly, never gets faster.

Escalation tree discipline is an operational discipline that pays off across years. Nova AI Ops integrates with on-call telemetry, surfaces escalation patterns, and supports the team’s incident-response discipline.