TCP vs UDP for SREs
Protocol choice.
Overview
TCP vs UDP picks the right transport protocol for the workload. TCP for reliability, UDP for low overhead, QUIC for both at the cost of complexity; the choice should match the workload, not the team’s comfort.
- Protocol choice. TCP for reliability; UDP for low overhead; QUIC bridges; pick to match workload.
- TCP reliability. Ordered, retransmitted, congestion-controlled; the right shape for business-critical traffic.
- UDP low overhead. No handshake, no ordering; the right shape for real-time (voice, video, gaming, telemetry).
- QUIC plus use case alignment. Reliability over UDP for modern web; per-workload the matching protocol.
The approach
The practical approach: TCP for business-critical, UDP for real-time, QUIC for modern web, documented choice per service, per-protocol monitoring. The team’s discipline produces matched transport.
- TCP for business-critical. Reliability matters more than latency; payments, transactions, anything where loss is unacceptable.
- UDP for real-time. Voice, video, gaming; the application handles loss; latency is the constraint.
- QUIC for modern web. Reliability over UDP with low latency; HTTP/3 is the canonical example.
- Document the choice. Per-service protocol committed to the repo; supports operations.
- Monitor the protocol. Per-protocol metrics; TCP retransmits, UDP drops, QUIC stream stalls.
Why this compounds
Transport protocol discipline compounds across services. Each correct choice produces ongoing value; the team’s networking expertise grows; new services pick the right protocol on the first try.
- Better latency. Right protocol matches workload; the user-visible latency tracks the protocol fit.
- Better reliability. TCP for business-critical; the messages that matter are not lost.
- Better cost efficiency. UDP reduces overhead; the network bill drops on real-time workloads.
- Institutional knowledge. Each choice teaches transport patterns; the team’s networking muscle grows.
TCP vs UDP discipline is an operational discipline that pays off across years. Nova AI Ops integrates with transport telemetry, surfaces patterns, and supports the team’s networking discipline.