Route 53 Failover Strategies
Route 53 supports multiple failover strategies. The decision rule per use case.
Primary-secondary failover
Primary-secondary is the simplest active-passive pattern Route 53 supports. The secondary record only serves traffic when health checks on the primary fail.
- Two records. One primary, one secondary; Route 53 swaps them automatically based on health-check results.
- Active-passive fit. Primary serves the production region; secondary is the warm standby in another region.
- Health checks. HTTP, HTTPS, TCP, or calculated from a CloudWatch metric; 10 to 30 second cadence is typical.
- Failback. Primary recovers; Route 53 returns traffic on the next health check; consider session stickiness if it matters.
Weighted routing
Weighted routing splits traffic by configurable percentages. It is the DNS-layer canary tool when application-layer canary is impractical.
- Split. Define weights per record (e.g. 70/30 between two regions, or 95/5 for a canary deployment).
- Use case. Gradual migration between versions, regions, or implementations; tune the dial in steps.
- TTL caveat. Weight changes take effect over the TTL window; set TTL to 60s when actively shifting traffic.
- Caching reality. Some resolvers ignore TTL; expect a long tail of clients on the old weight for hours.
Latency-based routing
Latency-based routing sends each client to the AWS region with the lowest measured latency. It is the default for performance-sensitive global apps.
- Mechanism. AWS measures resolver-to-region latency; Route 53 returns the closest healthy region's IP.
- Active-active fit. Pairs naturally with multi-region deployments where every region serves the same content.
- Not anycast. Latency-based is DNS-level (converges in TTL windows); anycast is BGP-level (converges in seconds).
- Resolver bias. Routing decisions are based on the resolver's location, not the client's; mobile and corporate networks can skew the result.
Geolocation routing
Geolocation routing is the compliance lever, not the performance lever. Use it when regulation, not latency, drives where requests land.
- Mechanism. Route by the user's geographic location (continent, country, or US state).
- Compliance use. EU users routed to EU regions for GDPR; KSA users routed to ME regions for PDPL.
- Performance trade-off. Less precise than latency-based for speed; geographically close is not always lowest latency.
- Chaining. Geolocation to country plus latency-based within country is a common combination.
Picking a strategy
The choice usually decides itself once you name the constraint that matters most: failover, migration, performance, or compliance.
- Active-passive failover. Primary-secondary; simplest, well-understood, the default unless something else applies.
- Canary or migration. Weighted; gradual control, easy rollback by shifting the dial.
- Global performance. Latency-based; default for performance-sensitive multi-region apps.
- Regulation-driven. Geolocation; the only correct answer when compliance is the constraint.