The On-Call Cool-Down Period

After incidents: cool-down. Reduces secondary errors.

The cool-down protocol

The cool-down protocol is mandatory rest after major incidents: 30-60 minutes of explicit recovery before resuming normal work. Rest reduces secondary errors because a tired on-call making decisions immediately after a sev 1 is at higher risk of compounding the incident. Backup on-call covers the cool-down window.

Mandatory rest. 30-60 minutes of explicit recovery after major incidents; not optional.
Reduces secondary errors. A tired on-call making decisions immediately after a sev 1 is at higher compounding risk.
Backup covers. Not a vacation; an explicit handoff for a bounded window.
Per-incident protocol invocation. The cool-down is a named protocol with a named trigger, not a vague suggestion.

When to invoke

The cool-down has predictable triggers. Always after sev 1; after long sev 2 incidents; after a string of consecutive sev 3 incidents because cumulative load is itself a fatigue source. The trigger criteria are documented so invocation is automatic, not a debate.

Sev 1. Always; the highest-stakes incidents drain the on-call regardless of duration.
Sev 2 over 4 hours. Long durations are draining even at lower severity.
Consecutive sev 3. Multiple in a shift; the cumulative load is the issue, not any single page.
Per-trigger documented criteria. The trigger is committed to the runbook; invocation is automatic, not a debate.

How long

Cool-down length scales with severity. 30 minutes minimum, longer for multi-hour sev 1 or customer-facing data issues, up to 2 hours when leadership coordination was required, half-day for catastrophic incidents (data loss, major outages, security events) because the recovery is real and shortcutting it produces secondary incidents.

30 minutes minimum. The base recovery window; longer for severe incidents.
Up to 2 hours. Incidents that involved customer impact or leadership coordination.
Half-day for catastrophic. Data loss, major outages, security events; the recovery is real.
Per-severity duration table. The duration mapping documented; supports consistent invocation across the team.

What to do during cool-down

Cool-down is recovery, not reduced-intensity work. Step away from the keyboard, walk, eat, rest, anything except continued incident work; brief debrief with the team is acceptable if it helps process the experience; the postmortem first draft can wait until the on-call is rested.

Step away from the keyboard. Walk, eat, rest; anything except continued incident work.
Brief debrief acceptable. Process the experience with teammates if it helps; do not turn it into work.
Defer the postmortem draft. The on-call writes the timeline later; the analytical work waits until they’re rested.
Per-cool-down activity guidance. Documented so the on-call doesn’t fall into work by reflex.

Making it stick

Cool-downs only stick if culture and tracking enforce them. Manager enforcement (engineers self-impose poorly), public norm (announce the cool-down, remove stigma), and tracking (cool-downs that aren’t taken get flagged because powering through is risk, not heroism).

Manager enforcement. Engineers self-impose poorly; managers must require the rest.
Public norm. Team announces cool-down ("I’m cooling down for an hour after that sev 1"); removes stigma.
Track usage. Cool-downs that aren’t taken get flagged; powering through is risk, not heroism.
Per-team cool-down audit. Quarterly review of cool-down adherence; supports the cultural reinforcement.