The On-Call Cool-Down Period
After incidents: cool-down. Reduces secondary errors.
The cool-down protocol
After major incidents, the on-call gets 30-60 minutes of explicit rest before resuming normal work. Mandatory; not optional.
Rest reduces secondary errors. A tired on-call making decisions immediately after a sev 1 is at higher risk of compounding the incident.
Backup on-call covers during cool-down. Not a vacation; an explicit handoff for a bounded window.
When to invoke
After sev 1 incidents. Always.
After sev 2 incidents that ran more than 4 hours. Long durations are draining.
After multiple consecutive sev 3 incidents. The cumulative load is the issue.
How long
30 minutes minimum. Longer for severe incidents (multi-hour sev 1, customer-facing data issues).
Up to 2 hours for incidents that involved customer impact or required coordination with leadership.
Half-day for catastrophic incidents (data loss, major outages, security events). The recovery is real.
What to do during cool-down
Step away from the keyboard. Walk; eat; rest. Anything except continued incident work.
Brief debrief with the team if it helps process the experience.
Defer the postmortem first draft. The on-call writes the timeline; the analytical work waits until they're rested.
Making it stick
Manager enforcement. Engineers self-impose poorly; managers must require the rest.
Public norm. Team announces cool-down: 'I'm cooling down for an hour after that sev 1.' Removes stigma.
Track usage. Cool-downs that aren't taken should be flagged. Engineers powering through is not heroism; it's risk.