Performance vs Reliability

Sometimes trade.

Overview

Performance and reliability sometimes trade against each other. Aggressive caching cuts latency and reduces consistency; synchronous replication preserves durability and adds latency; retries improve reliability and cost throughput; multi-AZ failover produces resilience at the cost of cross-AZ latency. The discipline is recognising the trade-off explicitly and choosing per service rather than chasing one at the expense of the other.

Sometimes they trade. Aggressive caching reduces consistency; replication adds latency. The trade-off is real and worth naming.
Caching trade-offs. Stale cache versus cache-miss latency. Workload decides which side wins.
Sync versus async replication. Sync produces durability; async produces latency. RPO requirements decide.
Retry overhead plus failover overhead. Retries improve reliability and cost performance; multi-AZ produces resilience at the cost of cross-AZ latency.

The approach

Three habits keep the trade-off explicit: priority per tier so customer-facing services prioritise reliability, documented rationale per decision, and monitoring of both signals so engineering can see when the trade-off bites.

Per-tier priority. Customer-facing services prioritise reliability. Background services can prioritise throughput.
Documented trade-off rationale. Per-decision the why-this-side documented. Future readers inherit the reasoning.
Monitor both signals. Performance and reliability metrics on the same dashboard. The trade-off becomes visible.
Test failure modes plus per-service choice. Chaos validates trade-offs; per-service the priority documented.

Why this compounds

Each correct trade-off deposits operational quality across the year. The team’s engineering maturity deepens; new services arrive at decisions on data; incident response improves because responders know which side of the trade-off the system was operating on.

Engineering culture matures. Trade-off awareness produces real engineering thinking instead of one-sided optimisation.
Operational fit improves. Right priority for the workload. Customer-facing tiers stay reliable; background tiers stay fast.
Incident response improves. Trade-off awareness supports investigation. MTTR drops on incidents that touch the trade-off.
Year-one investment, year-two habit. First trade-off documented is heavy lift. By the third service, the methodology is settled.