SLOs Against Aggregations vs Against Percentiles

The aggregation you choose changes the meaning of the SLO. Pick deliberately, not by default.

Why the aggregation matters

Average latency hides the long tail; p99 surfaces it. Two services with the same average can have radically different user experience; the aggregation choice changes what the SLO actually measures.

Average flattens the tail. A 95th-percentile spike vanishes into the mean; users in the tail experience pain the SLO does not see.
Percentiles preserve shape. p99 captures "worst 1% of requests"; matches the user experience of "this app feels slow."
Same mean, different shapes. One service: tight distribution. Another: bimodal with a slow tail. Same average, different SLO meaning.
Pick deliberately. The aggregation is a product decision, not a default; pick to match user perception, not query convenience.

Four common shapes

Four aggregation shapes cover most SLO definitions. Each has a sweet spot and a failure mode; knowing both prevents picking the wrong one by default.

Average. Cheap to compute; misleading for tail-sensitive workloads; almost never the right SLO base for user-facing services.
p50 / median. Better than average; still hides 50% of user requests; useful as a paired signal, not a primary SLO.
p99. Catches the long tail; users in the tail are real users; the default for most consumer SaaS latency SLOs.
p99.9. Bleeding edge; one request in 1000; only justified for high-stakes services with budget for the engineering work.

Matching to user perception

User perception of "slow" happens at p95-p99 of their own request stream. Counterintuitively, p99 of your aggregate often matches a real user’s p50 because users make many requests; pick the SLO base to model user experience, not infrastructure tidiness.

Users feel the tail. A user sending 100 requests will hit your p99 once; that one slow request is what they remember.
Aggregate p99 vs user p50. The math: across many users, your p99 ≈ each user’s typical worst experience.
Consumer SaaS default. p99 is the right SLO base; matches user perception without the engineering cost of p99.9.
B2B SLOs differ. Single-customer requests at high volume need p99.9; the user is a system, and systems notice 1-in-1000.

Combining shapes

Some teams ship separate SLOs at p50 and p99 simultaneously. Catches both classes of regression: median drift (everyone slower) and tail expansion (some users much slower). The cost is 2x SLO maintenance; worth it for high-stakes services.

p50 SLO catches median drift. Every request slower; capacity issue or upstream slowdown; the broad regression signal.
p99 SLO catches tail expansion. Some requests much slower; resource contention, cold cache, GC pause; the targeted signal.
Cost. 2x dashboards, 2x alerts, 2x burn-rate rules; the maintenance overhead is real.
When worth it. Payment processing, healthcare, anything where both classes of regression cost real money.

Antipatterns

Average-only SLO. Misses tail outages.
p99.9 SLO without budget for the tail. Always burning.
Same shape for every service. Mismatch with user model.

What to do this week

Three moves. (1) Apply the pattern to your most-impactful service. (2) Measure adherence for 30 days. (3) Rewrite the policy or the SLO if the gap is durable.