Database Replicas: Read Replicas vs Failover Replicas

Read replicas and failover replicas look similar; serve different purposes. Conflating them creates surprises.

Why distinguish

Read replicas and failover replicas look the same in the documentation. They serve different purposes; conflating them creates surprises in the worst possible moment.

Read replica. Serves read traffic; performance lag tolerated; not promotion-ready.
Failover replica. Standby; minimal lag; ready to take over on primary failure.
Same shape, different intent. Both are replicas; the operational role is what differs.
Conflation cost. Treating one as the other shows up during the worst incident, not the best day.

Four criteria

1. Lag tolerance.
2. Promotion-ready.
3. Resource sizing.
4. Application connection.

Configuration differences

The two roles need different configuration. Replication mode, instance sizing, and application connection paths all diverge.

Read replica. Async replication is fine; smaller instance acceptable; app connects via separate read endpoint.
Failover replica. Synchronous if possible; same size as primary; app discovers via DNS or endpoint switch.
Endpoint shape. Read traffic goes to a load-balanced read endpoint; failover happens at the primary endpoint level.
Lag monitoring. Both monitored, with different thresholds; failover lag matters in seconds, read lag in minutes.

Conflation mistake

The two failure modes from conflation are mirror images. Each one ruins a different day; both are avoidable with explicit role assignment.

Read replica as failover. Primary dies, you fail over to a replica that lags by minutes; data loss.
Failover replica as read source. Read load slows replication; lag grows; promotion-readiness compromised.
Mixed role. One replica trying to be both; serves neither role well; expect either data loss or slow failover.
Documented role. Each replica's role written down; no ambiguity at 3am when the primary is down.

Antipatterns

Read replica as failover. Data loss risk.
Failover replica handling read load. Promotion delayed.
One replica for both. Confused responsibilities.

What to do this week

Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.