p99 and Tail Latency: The Number You Cannot Ignore
Average latency is comforting and wrong. p99 is uncomfortable and right.
Why average lies
Two services with same average can have very different p99.
Users in the tail are real users; their experience is the experience.
Four causes of tail growth
- 1. Lock contention.
- 2. Garbage collection.
- 3. Cold caches.
- 4. Resource saturation in part of the fleet.
Per-cause mitigation
Lock contention: lock-free data structures; sharding.
GC: tuning; smaller heap.
Cold caches: warm-up; per-region caches.
Saturation: load balancing; rebalance.
Tail-aware monitoring
Histograms (Prometheus _bucket) over averages.
Per-percentile alerting; not just average.
Per-tenant percentile to catch tenant-specific tails.
Antipatterns
- Average-only monitoring. Misses tail.
- p99 ignored as ‘outlier.’ User-impacting.
- Optimizing average. Wrong target.
What to do this week
Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.