Performance vs Reliability
Sometimes trade.
Overview
Performance and reliability sometimes trade against each other. Aggressive caching cuts latency and reduces consistency; synchronous replication preserves durability and adds latency; retries improve reliability and cost throughput; multi-AZ failover produces resilience at the cost of cross-AZ latency. The discipline is recognising the trade-off explicitly and choosing per service rather than chasing one at the expense of the other.
- Sometimes they trade. Aggressive caching reduces consistency; replication adds latency. The trade-off is real and worth naming.
- Caching trade-offs. Stale cache versus cache-miss latency. Workload decides which side wins.
- Sync versus async replication. Sync produces durability; async produces latency. RPO requirements decide.
- Retry overhead plus failover overhead. Retries improve reliability and cost performance; multi-AZ produces resilience at the cost of cross-AZ latency.
The approach
Three habits keep the trade-off explicit: priority per tier so customer-facing services prioritise reliability, documented rationale per decision, and monitoring of both signals so engineering can see when the trade-off bites.
- Per-tier priority. Customer-facing services prioritise reliability. Background services can prioritise throughput.
- Documented trade-off rationale. Per-decision the why-this-side documented. Future readers inherit the reasoning.
- Monitor both signals. Performance and reliability metrics on the same dashboard. The trade-off becomes visible.
- Test failure modes plus per-service choice. Chaos validates trade-offs; per-service the priority documented.
Why this compounds
Each correct trade-off deposits operational quality across the year. The team’s engineering maturity deepens; new services arrive at decisions on data; incident response improves because responders know which side of the trade-off the system was operating on.
- Engineering culture matures. Trade-off awareness produces real engineering thinking instead of one-sided optimisation.
- Operational fit improves. Right priority for the workload. Customer-facing tiers stay reliable; background tiers stay fast.
- Incident response improves. Trade-off awareness supports investigation. MTTR drops on incidents that touch the trade-off.
- Year-one investment, year-two habit. First trade-off documented is heavy lift. By the third service, the methodology is settled.