Route 53 Failover Strategies

Route 53 supports multiple failover strategies. The decision rule per use case.

Primary-secondary failover

Primary-secondary is the simplest active-passive pattern Route 53 supports. The secondary record only serves traffic when health checks on the primary fail.

Two records. One primary, one secondary; Route 53 swaps them automatically based on health-check results.
Active-passive fit. Primary serves the production region; secondary is the warm standby in another region.
Health checks. HTTP, HTTPS, TCP, or calculated from a CloudWatch metric; 10 to 30 second cadence is typical.
Failback. Primary recovers; Route 53 returns traffic on the next health check; consider session stickiness if it matters.

Weighted routing

Weighted routing splits traffic by configurable percentages. It is the DNS-layer canary tool when application-layer canary is impractical.

Split. Define weights per record (e.g. 70/30 between two regions, or 95/5 for a canary deployment).
Use case. Gradual migration between versions, regions, or implementations; tune the dial in steps.
TTL caveat. Weight changes take effect over the TTL window; set TTL to 60s when actively shifting traffic.
Caching reality. Some resolvers ignore TTL; expect a long tail of clients on the old weight for hours.

Latency-based routing

Latency-based routing sends each client to the AWS region with the lowest measured latency. It is the default for performance-sensitive global apps.

Mechanism. AWS measures resolver-to-region latency; Route 53 returns the closest healthy region's IP.
Active-active fit. Pairs naturally with multi-region deployments where every region serves the same content.
Not anycast. Latency-based is DNS-level (converges in TTL windows); anycast is BGP-level (converges in seconds).
Resolver bias. Routing decisions are based on the resolver's location, not the client's; mobile and corporate networks can skew the result.

Geolocation routing

Geolocation routing is the compliance lever, not the performance lever. Use it when regulation, not latency, drives where requests land.

Mechanism. Route by the user's geographic location (continent, country, or US state).
Compliance use. EU users routed to EU regions for GDPR; KSA users routed to ME regions for PDPL.
Performance trade-off. Less precise than latency-based for speed; geographically close is not always lowest latency.
Chaining. Geolocation to country plus latency-based within country is a common combination.

Picking a strategy

The choice usually decides itself once you name the constraint that matters most: failover, migration, performance, or compliance.

Active-passive failover. Primary-secondary; simplest, well-understood, the default unless something else applies.
Canary or migration. Weighted; gradual control, easy rollback by shifting the dial.
Global performance. Latency-based; default for performance-sensitive multi-region apps.
Regulation-driven. Geolocation; the only correct answer when compliance is the constraint.