SRE Best Practices Practical By Samson Tanimawo, PhD Published May 16, 2026 4 min read

The Vendor Failure Drill

Quarterly drill: assume your largest vendor goes down. The drill format, the gaps it surfaces, and the runbook updates that follow.

Scenario

Pick the vendor whose failure would hurt most: payment processor, email provider, primary cloud, observability vendor.

Simulate the failure: route 100% of traffic through degradation paths or alternate providers.

2-hour drill window. Real on-call practices the response.

Common gaps

No alternate provider. Some vendors do not have practical alternatives; the drill confirms whether the team can survive without them.

Stale runbook. The 'failover to backup' steps no longer work because the backup config drifted.

Communication: who tells customers? Often unclear until the drill.

Output

Updated runbooks. The drill reveals every step that did not work; each becomes an action item.

Investment decisions. Sometimes the drill says 'we cannot survive this vendor's failure'; that is a budget conversation.

Confidence. Teams that have drilled are calmer when the real thing happens.