Connection Timeout Tuning
Idle, read, write timeouts.
Overview
Connection timeout tuning matches timeouts across the network stack. Misaligned timeouts produce confusing failure modes (LB longer than upstream means LB times out before upstream finishes, the opposite means upstream completes work the LB has discarded); the discipline aligns the chain.
- Idle, read, write timeouts. Per-tier timeout; each layer has its own configuration that needs to match the chain.
- Per-tier alignment. LB timeout less than upstream timeout; produces predictable behaviour and avoids work-after-disconnect.
- Application timeout. Per-call deadline via context; matches workload by setting per-call budget.
- Per-RPC timeout plus documented chain. Per-RPC deadline matches modern stacks; per-tier timeout chain documented for review.
The approach
The practical approach: per-tier alignment with LB shortest, application context deadline per call, per-RPC timeout for modern stacks, monitor timeout rate per tier, documented chain. The team’s discipline produces predictable timeouts rather than mysterious failures.
- Per-tier alignment. LB less than app, app less than DB; the chain shapes when each layer gives up.
- Application deadline. Per-call context deadline; the call has a budget, not infinite patience.
- Per-RPC timeout. Per-RPC deadline; modern stacks support this natively.
- Monitor timeout rate plus documented chain. Per-tier timeout rate supports investigation; per-tier timeout chain committed for operational reviews.
Why this compounds
Timeout tuning discipline compounds across services. Each correctly-aligned tier produces ongoing reliability; the team’s reliability expertise grows; new services inherit the timeout chain pattern.
- Better predictability. Aligned timeouts produce predictable behaviour; the failure mode matches what the team expects.
- Better incident response. Right timeouts catch real failures; the timeout fires when the work actually failed.
- Better operational fit. Right deadline matches workload; user-facing requests get tight deadlines, batch gets relaxed.
- Institutional knowledge. Each timeout teaches network patterns; the team’s reliability muscle grows.
Timeout tuning discipline is an operational discipline that pays off across years. Nova AI Ops integrates with network telemetry, surfaces patterns, and supports the team’s reliability discipline.