Response Streaming

Stream long responses.

Overview

Response streaming starts sending response data before the full response is computed. Total latency is the easy metric; perceived latency is what determines whether the user feels the system as fast or slow.

Stream long responses. Per-response time-to-first-byte; matches user perception; the user starts reading before the response completes.
Server-sent events. Per-event push; the right shape for one-way streaming (notifications, dashboards).
Chunked transfer encoding plus HTTP/2. HTTP/1.1 chunked is universally supported; HTTP/2 streaming adds multiplexing.
LLM token streaming. Per-token push; matches AI workloads where the user reads as the model generates.

The approach

The practical approach: stream long responses by default, monitor time-to-first-byte as a first-class metric, disable LB buffering for streaming routes, document the buffering chain, handle backpressure explicitly. The team’s discipline produces fast user experience.

Per-response streaming. Long responses stream; the user sees data while the rest is computed.
Monitor TTFB. Per-response time-to-first-byte; the user-facing metric that captures perceived latency.
Disable LB buffering. proxy_buffering off in nginx for streaming routes; otherwise the LB defeats the streaming.
Document the buffering. Per-tier buffering documented; the chain from origin to user must not silently buffer.
Backpressure handling. Per-stream backpressure; slow consumers must not crash the producer.

Why this compounds

Streaming discipline compounds across services. Each streaming response produces ongoing user-experience improvement; the team’s HTTP expertise grows; new endpoints ship streaming-aware on the first try.

Better user experience. Lower TTFB feels faster; the user does not feel the total latency.
Better resource efficiency. Streaming reduces memory peaks; the server does not buffer the whole response.
Better LLM integration. Token streaming matches LLM output; the user reads as the model generates.
Institutional knowledge. Each stream teaches HTTP patterns; the team’s user-experience muscle grows.

Streaming discipline is an engineering discipline that pays off across years. Nova AI Ops integrates with HTTP telemetry, surfaces patterns, and supports the team’s user-experience discipline.