The On-Call Cool-Down Period

After incidents: cool-down. Reduces secondary errors.

The cool-down protocol

The cool-down protocol is mandatory rest after major incidents: 30-60 minutes of explicit recovery before resuming normal work. Rest reduces secondary errors because a tired on-call making decisions immediately after a sev 1 is at higher risk of compounding the incident. Backup on-call covers the cool-down window.

When to invoke

The cool-down has predictable triggers. Always after sev 1; after long sev 2 incidents; after a string of consecutive sev 3 incidents because cumulative load is itself a fatigue source. The trigger criteria are documented so invocation is automatic, not a debate.

How long

Cool-down length scales with severity. 30 minutes minimum, longer for multi-hour sev 1 or customer-facing data issues, up to 2 hours when leadership coordination was required, half-day for catastrophic incidents (data loss, major outages, security events) because the recovery is real and shortcutting it produces secondary incidents.

What to do during cool-down

Cool-down is recovery, not reduced-intensity work. Step away from the keyboard, walk, eat, rest, anything except continued incident work; brief debrief with the team is acceptable if it helps process the experience; the postmortem first draft can wait until the on-call is rested.

Making it stick

Cool-downs only stick if culture and tracking enforce them. Manager enforcement (engineers self-impose poorly), public norm (announce the cool-down, remove stigma), and tracking (cool-downs that aren’t taken get flagged because powering through is risk, not heroism).