Traces Cost Optimization
Sampling drives trace cost.
Overview
Trace cost is dominated by sampling rate. Full sampling is rarely affordable at scale; smart sampling preserves investigation value at a fraction of the cost. The discipline is picking the right sampling strategy per workload.
- Sampling drives trace cost. Each retained trace costs storage and ingest. The sampling rate is the dominant cost lever.
- Head-based sampling. Up-front sampling decision per trace. Cheap to implement; misses interesting traces.
- Tail-based sampling. Post-decision sampling once the trace completes. Always keeps errors and slow traces; modern best practice.
- Per-tier sampling plus error prioritisation. Different rates per service tier; errors and slow traces always retained regardless of base rate.
The approach
Three habits keep trace cost matched to investigation value: tail-based sampling as default, error-trace always-keep policy, and quarterly cost audits.
- Tail-based sampling. Decide what to keep after the trace completes. Errors and high-latency traces survive; the rest get sampled at base rate.
- Error prioritisation. Always keep traces with errors or unusual latency. The investigation set is the bad traces, not random ones.
- Per-tier sampling rate. Critical services sample at 10 percent; internal batch sample at 1 percent. Match rate to investigation need.
- Quarterly audit plus documented policy. Quarterly trace-cost review catches drift; per-team sampling policy lives in the wiki.
Why this compounds
Each correctly-sampled trace produces investigation value at controlled cost. The team learns sampling through repeated review; new services ship with rates that match their tier from day one.
- Cost efficiency. Right sampling matches workload. Critical services keep what matters; long-tail services do not pay for noise.
- Investigation quality. Error traces preserved. The traces operators actually need during incidents are the ones that survive sampling.
- Operational fit. Right policy per tier matches priorities. Customer-facing services get richer traces; internal jobs get sparse ones.
- Year-one investment, year-two habit. The first policy is investment. By year two, every new service ships with a sampling decision.