Response Streaming
Stream long responses.
Overview
Response streaming starts sending response data before the full response is computed. Total latency is the easy metric; perceived latency is what determines whether the user feels the system as fast or slow.
- Stream long responses. Per-response time-to-first-byte; matches user perception; the user starts reading before the response completes.
- Server-sent events. Per-event push; the right shape for one-way streaming (notifications, dashboards).
- Chunked transfer encoding plus HTTP/2. HTTP/1.1 chunked is universally supported; HTTP/2 streaming adds multiplexing.
- LLM token streaming. Per-token push; matches AI workloads where the user reads as the model generates.
The approach
The practical approach: stream long responses by default, monitor time-to-first-byte as a first-class metric, disable LB buffering for streaming routes, document the buffering chain, handle backpressure explicitly. The team’s discipline produces fast user experience.
- Per-response streaming. Long responses stream; the user sees data while the rest is computed.
- Monitor TTFB. Per-response time-to-first-byte; the user-facing metric that captures perceived latency.
- Disable LB buffering.
proxy_buffering offin nginx for streaming routes; otherwise the LB defeats the streaming. - Document the buffering. Per-tier buffering documented; the chain from origin to user must not silently buffer.
- Backpressure handling. Per-stream backpressure; slow consumers must not crash the producer.
Why this compounds
Streaming discipline compounds across services. Each streaming response produces ongoing user-experience improvement; the team’s HTTP expertise grows; new endpoints ship streaming-aware on the first try.
- Better user experience. Lower TTFB feels faster; the user does not feel the total latency.
- Better resource efficiency. Streaming reduces memory peaks; the server does not buffer the whole response.
- Better LLM integration. Token streaming matches LLM output; the user reads as the model generates.
- Institutional knowledge. Each stream teaches HTTP patterns; the team’s user-experience muscle grows.
Streaming discipline is an engineering discipline that pays off across years. Nova AI Ops integrates with HTTP telemetry, surfaces patterns, and supports the team’s user-experience discipline.