Performance Intermediate By Samson Tanimawo, PhD Published Oct 20, 2026 9 min read

p99 and Tail Latency: The Number You Cannot Ignore

Average latency is comforting and wrong. p99 is uncomfortable and right.

Why average lies

Two services with same average can have very different p99.

Users in the tail are real users; their experience is the experience.

Four causes of tail growth

Per-cause mitigation

Lock contention: lock-free data structures; sharding.

GC: tuning; smaller heap.

Cold caches: warm-up; per-region caches.

Saturation: load balancing; rebalance.

Tail-aware monitoring

Histograms (Prometheus _bucket) over averages.

Per-percentile alerting; not just average.

Per-tenant percentile to catch tenant-specific tails.

Antipatterns

What to do this week

Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.