On-Call Handoff Template
Eight fields. Seven minutes. The difference between the next on-call walking into a clean shift and walking into a landmine. Copy this, fill it in, post it in the channel.
The template
Drop this into the on-call channel at the end of your shift. Don't fill it in on a Friday after a long shift; fill in the live fields throughout the week so 7 minutes is all it takes.
- 1. Shift summary. One line. "Quiet, two false-positive pages on payment-svc, no customer impact."
- 2. Open incidents. Each with link, severity, current state, owner, next action.
- 3. Suppressed alerts & silences. Anything snoozed and when it expires.
- 4. In-flight changes & freezes. Active deploys, freezes in effect, scheduled change windows.
- 5. Weekend / overnight risks. Known stuff that might page; capacity headroom; expiring certs.
- 6. New runbooks / changes since last shift. What the next person doesn't know yet.
- 7. Contacts & escalation path. Who's the secondary; how to reach Eng leadership.
- 8. Outstanding follow-ups. Action items from this shift's incidents that aren't done yet.
Open incidents
The most important field. Skip anything below this if you have to; never skip this.
- For each open incident: link, severity, status (mitigated / investigating / monitoring), owner, next action with timestamp.
- "Next action with timestamp" is the magic phrase. "Wait for vendor reply, ETA 02:00 UTC" is way more useful than "ongoing".
- Mitigated ≠ resolved. If it's mitigated but the root cause isn't fixed, flag it. The next on-call needs to know it can come back.
- Customer-facing comms status, was a public statuspage update sent? When? Is one due?
- Workarounds in place, if you've routed traffic around something or scaled up "temporarily", say so. Temporary fixes outlive their authors.
- Link to the incident channel / war room thread, not just the ticket. Context lives in the channel.
Suppressed alerts & silences
Silenced alerts are the biggest source of "we should have known". Two questions: what's silenced, and when does the silence expire?
- List every active silence. Alert name, scope, expiry timestamp, reason, the human who silenced it.
- Anything silenced for "> 24 hours" is a runbook task, not an alert. Convert it or delete it.
- Silences expiring during the next shift, flag them explicitly. "X silence expires Sat 06:00, alert may resume."
- If you silenced something, mention why. "Known bug, fix in #PR-1234" is way more useful than "noise".
- If a silence has been renewed > 3 times, it's a process failure. Add it to follow-ups.
In-flight changes & freezes
- Active deploys, what's currently rolling out, what stage (canary / progressive / full), expected completion time.
- Active freezes, production code freeze in effect? Until when? What's the exception process?
- Scheduled change windows, vendor maintenance, DB upgrade, cert rotation. Time + scope.
- Recent rollbacks, if you rolled something back, the next on-call needs to know it's still rolled back. Otherwise they'll find a "weird discrepancy" and panic.
- Feature flags flipped during the shift, flag name, environment, why. Especially anything flipped during an incident.
Weekend / overnight risks
The "what might page you" field. Predict the night so the next person isn't surprised.
- Capacity headroom, any service running < 30% headroom on CPU/memory? Pages on traffic spikes.
- Approaching limits, quotas (cloud, vendor), license counts, disk space, certificate expiry within 30 days.
- Recent regressions in monitoring, an alert that has fired more than usual without resolution, even if individually mitigated.
- Expected traffic, marketing campaign, sale, scheduled batch job. Predicted load above baseline.
- Vendor incidents you're tracking, statuspage links, expected resolution. Helps the next on-call ignore upstream-caused noise faster.
Contacts & escalation
- Primary & secondary on-call for the next shift, with phone / pager handles, not just usernames.
- Subject-matter experts, "Database is owned by data-platform team; their on-call is in #data-oncall". Avoids the 3am search.
- Engineering leadership escalation, whose phone you call for a Sev-1 outside business hours.
- Customer-facing comms approval, who has to sign off on a status page update. PR/Marketing on-call rotation if there is one.
- Vendor support contacts, account numbers, support tier, after-hours phone. Buried in a wiki is too slow at 3am.
- Outstanding follow-ups (field 8), bullet list, with assignees and target dates. Roll this forward shift to shift until items close.