The Vendor Failure Drill
Quarterly drill: assume your largest vendor goes down. The drill format, the gaps it surfaces, and the runbook updates that follow.
Scenario
Pick the highest-impact vendor, simulate the failure, bound the drill at 2 hours. The point is to find the gaps before the real failure forces the team to find them under stress.
- Highest-impact vendor. Payment processor, email provider, primary cloud, observability vendor. Pick by blast radius, not by familiarity.
- Simulate the failure. 100-percent-through-degradation routing or alternate-provider switch. The simulation has to be realistic enough to surface real gaps.
- 2-hour drill window. Bounded time. Real on-call practices the response without the drill stretching into the working day.
- Documented scenario. Named vendor and failure mode. Supports later replication and cross-team comparison.
Common gaps
Three gaps surface predictably in vendor-failure drills. Each one is fixable, but most teams discover them only when the real vendor outage hits.
- No alternate provider. Many vendors have no practical alternative. The drill confirms whether the team can survive without them.
- Stale runbook. Failover steps no longer work. Backup config drifted; runbook still describes the old path.
- Unclear customer comms. “Who tells customers?” often has no answer until the drill forces the question.
- Captured gap list. Each gap surfaces as a written finding with an owner. The list drives targeted runbook updates.
Output
The drill produces three concrete outputs that justify the time. Updated runbooks, investment decisions, and team confidence.
- Updated runbooks. Per-step “did not work” list becomes the runbook-fix action backlog. Each gets an owner and deadline.
- Investment decisions. “We cannot survive this” outcomes become budget conversations. Real numbers replace hand-waving.
- Team confidence. Teams that have drilled respond more calmly to the real outage. The muscle memory shows.
- Published outcomes. Captured action items with owners published after each drill. Accountability stays visible across quarters.