Tier 1 vs Tier 2 Incident Response Teams
Some orgs split front-line and deep-dive. The tier model.
Tier 1
Tier 1 is the front-line responder pool. Broad coverage, fast triage, escalation discipline. The tier handles the easy 80 percent of incidents and escalates the rest.
- First responders. Named tier-1 per shift. Triages, acts on known issues from the runbook, escalates when stuck.
- Trained for speed. Optimised for time-to-acknowledge. Runbook coverage matters more than deep domain expertise.
- Runbook ownership. Each runbook names a tier-1 owner. The 80 percent of cases with a clear runbook stays in tier 1.
- Wider rotation depth. Tier-1 rotation is broader because the work is broader. Sustainable on-call requires the depth.
Tier 2
Tier 2 is the depth specialist pool. Engaged on the hard incidents tier 1 cannot resolve. Smaller pool, deeper knowledge, optimised for the long-tail cases.
- Deeper expertise. Per-domain specialist (database, networking, identity). Engaged when tier 1 escalates.
- Smaller pool, less rotation. Tier-2 rotation is narrower because the cases are rarer. Optimised for hard cases, not for fast ack.
- Documented escalation criteria. “When to escalate” written down per runbook. Catches both premature escalations and late ones.
- Postmortem ownership. Tier 2 authors postmortems for the incidents they resolved. The deeper context lives with the deeper responders.
When
The two-tier model fits scale. Below 1000 engineers, combined responder model is usually better; the handoff cost outweighs the specialisation benefit.
- Larger orgs (1000+ engineers). Tier separation pays off. Specialisation reduces tier-1 burnout and gives tier 2 enough cases to stay sharp.
- Smaller orgs. Combined model wins. Fewer handoffs, faster MTTR, simpler on-call rotation.
- Handoff cost. Each tier transition loses context. Over-tiering at small scale costs more than it saves.
- Documented model. Tier definitions written down. New engineers and managers reference the same model.