Trace Sampling Strategy by Service Tier
Critical services sample at 100%; low-stakes services sample at 1%. The tier model and how to apply it without losing debugging value.
Define the tiers
Tier 0: customer-critical paths (login, checkout). 100% sampling.
Tier 3: internal services with low impact. 1% head sampling.
Tail sampling for tier 1-2
Sample 100% of slow or error traces; 1% of healthy ones. Captures the bugs without paying for the boring traces.
Implementation: tail-sampling processor in the OTel collector. Not free; budget for it.
Review quarterly
Service tiers shift; new services launch. Quarterly review keeps the policy current.
Cost dashboard per tier. If tier 0 cost is exploding, the implementation has a leak.