Cloud & Infrastructure Intermediate By Samson Tanimawo, PhD Published Dec 9, 2026 10 min read

Cloud Provider Outage Playbook: Twelve Hours, Four Stages

Cloud outages are inevitable. The teams that handle them well work from a playbook; the teams that improvise burn out.

Stage 1: Confirm

First task: confirm the outage is real and is the cloud, not you. Check the cloud status page; check your monitoring from outside the affected region; rule out internal cause.

This step is short but critical. Treating ‘a cloud outage’ as the diagnosis without confirming wastes the first hour.

Stage 2: Communicate

Stage 3: Mitigate

What can you actually do during a region outage? Failover to a healthy region (if architected), drain traffic to standby, scale up surviving capacity.

The actions are determined by your architecture. The runbook should match what you actually have, not what you wish you had.

Stage 4: Recover

As the cloud recovers, ramp gradually. Bringing all traffic back at once will overload the recovering services and trigger a recovery cascade.

Post-incident: own your impact even though the root cause was the cloud. Customers do not care about the boundary.

Antipatterns

What to do this week

Three moves. (1) Pick the most exposed instance of the pattern in your environment. (2) Apply the lightest fix and measure for one week. (3) Schedule a quarterly review so the discipline does not rot.