Route 53 Failover Strategies
Route 53 supports multiple failover strategies. The decision rule per use case.
Primary-secondary failover
Two records: primary and secondary. Primary serves while healthy. Secondary takes over on health-check failure.
Best for active-passive setups. Primary is the production region; secondary is the failover region.
Health checks evaluate endpoint reachability. Configurable: HTTP, HTTPS, TCP, calculated based on metric. Frequency 10-30 seconds typical.
Weighted routing
Distribute traffic by configurable weights. 70/30 between two regions; or gradually shift from old version to new.
Best for canary rollouts at the DNS layer. Useful when application-layer canary is impractical.
Limitations: DNS caching means weight changes take effect over the TTL window. Set short TTLs (60s) when actively shifting weights.
Latency-based routing
Route each user to the AWS region with lowest measured latency from their location. Good baseline for global active-active.
AWS measures latency between user and each AWS region. Routes accordingly. The user gets a fast response without explicit configuration.
Not the same as anycast. Latency-based is DNS-level; anycast is BGP-level. DNS converges in TTL windows; anycast in seconds.
Geolocation routing
Route by user's geographic location. Often used for compliance: EU users routed to EU regions.
Less precise than latency-based for performance. More precise for compliance.
Combinations: geo to country, then latency within. Both routing methods can chain.
Picking a strategy
Active-passive failover: primary-secondary. Simplest; well-understood.
Canary or migration: weighted. Gradual control over traffic split.
Global active-active for performance: latency-based. Default for performance-sensitive global apps.
Compliance-driven: geolocation. When regulation, not performance, is the constraint.