Sampling Strategies for Distributed Tracing: Head, Tail, and Adaptive
Tracing every request is expensive; tracing none is useless. The strategy you pick decides what you can and cannot diagnose.
Why sampling is the cost knob
A modern service emits 1-3 spans per request. At 10k requests per second, that is 600M-1.8B spans per minute, storage and network you cannot afford. Sampling is the lever.
The wrong strategy makes the problem look solved while quietly throwing away the spans you most need.
Head-based sampling
- Head-based sampling decides at the start of the request. ‘Sample 1% of all traces’, the trace ID is hashed; if hash < 0.01, keep all spans; otherwise drop all.
- Pros: dead simple, deterministic, no storage at the collector.
- Cons: blind to slow or errored requests, you keep the same 1% whether they are interesting or not. The slow trace you need is statistically lost.
Tail-based sampling
Tail-based sampling waits until the trace finishes, then decides. ‘Keep 100% of traces with errors, 100% of traces above p99 latency, 1% of normal traces.’
Pros: keeps what you actually need; storage matches signal.
Cons: requires the OTel Collector or equivalent to buffer all spans for the trace duration; complex configuration; memory pressure on the collector.
Adaptive sampling
Adaptive sampling adjusts the rate dynamically based on traffic. Low traffic → sample more (so you have data); high traffic → sample less (so cost stays bounded).
Pros: bill stays predictable; coverage stays useful.
Cons: harder to reason about ‘what would I see for this trace?’ Comparison across time becomes muddier.
The pragmatic combo: tail-based with adaptive baseline rate. Keeps interesting; bounds cost; handles traffic spikes.
Antipatterns
- 1% head-based forever. Your slow traces never get sampled.
- 100% sampling because storage is ‘cheap.’ It is not, at scale.
- Different sampling per service. Trace fragments come back; the cross-service trace breaks.
What to do this week
Three moves. (1) If you are head-sampling, set up the OTel Collector with tail-based for one service. (2) Add ‘keep all errors’ and ‘keep all p95+ latency’ rules. (3) Watch the storage bill for two weeks; tune the baseline rate to keep it bounded.