Tier 1 vs Tier 2 Incident Response Teams

Some orgs split front-line and deep-dive. The tier model.

Tier 1

Tier 1 is the front-line responder pool. Broad coverage, fast triage, escalation discipline. The tier handles the easy 80 percent of incidents and escalates the rest.

First responders. Named tier-1 per shift. Triages, acts on known issues from the runbook, escalates when stuck.
Trained for speed. Optimised for time-to-acknowledge. Runbook coverage matters more than deep domain expertise.
Runbook ownership. Each runbook names a tier-1 owner. The 80 percent of cases with a clear runbook stays in tier 1.
Wider rotation depth. Tier-1 rotation is broader because the work is broader. Sustainable on-call requires the depth.

Tier 2

Tier 2 is the depth specialist pool. Engaged on the hard incidents tier 1 cannot resolve. Smaller pool, deeper knowledge, optimised for the long-tail cases.

Deeper expertise. Per-domain specialist (database, networking, identity). Engaged when tier 1 escalates.
Smaller pool, less rotation. Tier-2 rotation is narrower because the cases are rarer. Optimised for hard cases, not for fast ack.
Documented escalation criteria. “When to escalate” written down per runbook. Catches both premature escalations and late ones.
Postmortem ownership. Tier 2 authors postmortems for the incidents they resolved. The deeper context lives with the deeper responders.

When

The two-tier model fits scale. Below 1000 engineers, combined responder model is usually better; the handoff cost outweighs the specialisation benefit.

Larger orgs (1000+ engineers). Tier separation pays off. Specialisation reduces tier-1 burnout and gives tier 2 enough cases to stay sharp.
Smaller orgs. Combined model wins. Fewer handoffs, faster MTTR, simpler on-call rotation.
Handoff cost. Each tier transition loses context. Over-tiering at small scale costs more than it saves.
Documented model. Tier definitions written down. New engineers and managers reference the same model.