Region Failover Patterns Without Active-Active Cost
Active-active is the gold standard and the gold price. For most workloads, cheaper patterns deliver acceptable RTO without doubling spend.
The active-active price tag
Active-active means full capacity in two regions, both serving traffic. Storage replicated. Roughly 2x infrastructure cost. The benefit is near-zero RTO when one region fails.
For workloads where 5-30 minute RTO is acceptable, you do not need to pay 2x.
Four cheaper patterns
- 1. Warm standby. Smaller fleet in second region, scaled up on failover. ~30% cost premium.
- 2. Pilot light. Just data + minimal infrastructure. Bring up compute on failover. ~10% premium.
- 3. Backup & restore. Restore from cross-region backup. Hours-long RTO; near-zero cost premium.
- 4. Active-passive with DNS failover. Full capacity in second region but receives no traffic until failover. ~70% premium.
Tradeoff: failover time
Active-active: seconds.
Warm standby: 5-15 minutes.
Pilot light: 15-30 minutes.
Backup & restore: hours.
Pick the pattern that matches the SLO; do not over-buy.
Rehearsal as the proof
Untested failover is a story, not a recovery posture. Quarterly tabletops; annual real failovers.
Most teams discover their failover is broken on the first real failover. Rehearsal moves the discovery to a controlled time.
Antipatterns
- Active-active because “safer.” Pay the price knowingly, not by default.
- Pilot light without rehearsal. Untested infrastructure does not exist on incident day.
- Backup & restore without RTO measurement. The number is always larger than expected.
What to do this week
Three moves. (1) Pick the most exposed instance of the pattern in your environment. (2) Apply the lightest fix and measure for one week. (3) Schedule a quarterly review so the discipline does not rot.