SRE Best Practices Practical By Samson Tanimawo, PhD Published Jun 25, 2026 4 min read

Recovering From a Saturated On-Call

When the on-call has been pinned for 3+ days, normal recovery does not work. The 5-step protocol for getting the team back to baseline.

The 5 steps

Step 1: pause non-critical work. Marketing campaigns, feature launches, anything that adds complexity. Buys cognitive room.

Step 2: bring in additional on-call from another team for 48 hours. Lets the burned-out engineer actually rest.

Step 3: triage the incident backlog. Some items become 'will not fix'; the rest get owners.

Step 4: identify the burning fire. The single thing causing repeat alerts. Fix it before resuming normal cadence.

Step 5: resume normal operation only after a full quiet shift. Premature resumption produces relapse.

Signs you need this protocol

Three or more sleepless nights in a week. Not just busy; actually unable to sleep through.

On-call making mistakes that 24-hour-rest version of them would not.

Stakeholders questioning team capability. Burned-out teams produce visible quality drops.

Avoid

Heroism: 'I can power through.' Powering through is how saturated on-calls become quitting on-calls.

Pretending it is normal. The saturation is data; act on it.

Blame. Saturation usually has a system cause; finding it is more useful than finding fault.