The Vendor Failure Drill
Quarterly drill: assume your largest vendor goes down. The drill format, the gaps it surfaces, and the runbook updates that follow.
Scenario
Pick the vendor whose failure would hurt most: payment processor, email provider, primary cloud, observability vendor.
Simulate the failure: route 100% of traffic through degradation paths or alternate providers.
2-hour drill window. Real on-call practices the response.
Common gaps
No alternate provider. Some vendors do not have practical alternatives; the drill confirms whether the team can survive without them.
Stale runbook. The 'failover to backup' steps no longer work because the backup config drifted.
Communication: who tells customers? Often unclear until the drill.
Output
Updated runbooks. The drill reveals every step that did not work; each becomes an action item.
Investment decisions. Sometimes the drill says 'we cannot survive this vendor's failure'; that is a budget conversation.
Confidence. Teams that have drilled are calmer when the real thing happens.