SLOs by Customer Segment
Different SLOs for different customers.
Why segment SLOs
Customers are not equal. Premium tier paying $5k/month deserves tighter availability than Basic tier paying nothing; a single SLO across all customers averages the experience and hides the asymmetry. Segmented SLOs let you commit different reliability tiers in pricing (pay more, get tighter availability) and operational priority follows segment (premium degradation demands all-hands; free-tier degradation may be acceptable).
- Customers not equal. Premium $5k/month deserves tighter availability than Basic paying nothing.
- Single SLO hides asymmetry. The aggregate average masks the experience differences.
- Pricing-tier alignment. Reliability SLA becomes a product feature; pay more, get tighter availability.
- Operational priority follows segment. Premium degradation: all-hands; free-tier degradation: maybe acceptable.
How to structure segment SLOs
Structure starts at the edge. Tag every request with customer segment at the edge (authentication layer or API gateway adds a header, downstream metrics carry the label); define per-segment availability and latency SLOs (Premium 99.95% / p99 < 200ms; Standard 99.9% / p99 < 500ms; Free 99% / p99 best-effort); cardinality matters (3-5 segments works, 10 creates sprawl).
- Edge-layer segment tag. Auth layer or API gateway; downstream metrics carry the label.
- Per-segment SLOs. Premium 99.95% / 200ms; Standard 99.9% / 500ms; Free 99% / best-effort.
- 3-5 segments cardinality cap. 10 creates dashboard sprawl and metric cost.
- Commercial commitments only. Pick segments that map to contracts, not marketing personas.
Operating segment SLOs in production
Operating segment SLOs needs per-segment infrastructure. Per-segment dashboard with each segment’s SLO health visible plus burn-rate alerts per segment; capacity planning per segment because premium may need dedicated capacity, separate connection pools, prioritised queue lanes; incident triage looks at per-segment impact first (sev 1 if premium degraded, sev 2 if only standard).
- Per-segment dashboard. Each segment’s SLO health visible; burn-rate alerts per segment.
- Per-segment capacity. Premium may need dedicated capacity, separate pools, prioritised queues.
- Per-segment triage. Sev 1 if premium degraded; sev 2 if only standard; severity rubric encodes segmentation.
- Per-architecture decision. SLO drives the architecture; capacity follows the commitment.
Trade-offs and gotchas
Three trade-offs deserve attention. Maintenance burden (each segment is another set of dashboards, alerts, runbooks; don’t add segments unless they correspond to real commercial differences); internal-only segments are usually a mistake (engineering convenience is not a customer commitment); metric cardinality cost (the segment label adds a multiplier to time series count, real observability bill increase at high volume).
- Maintenance burden. Each segment is more dashboards, alerts, runbooks; only for real commercial differences.
- Internal-only segments wrong. Engineering convenience is not a customer commitment.
- Cardinality cost. Segment label adds a multiplier; high volume translates to real bill increase.
- Per-segment ROI test. Each segment must justify its maintenance and cardinality cost.
When to add segment SLOs
Add when pricing tiers exist with reliability commitments because without that, segmentation is optics not policy; add when premium customers complain about reliability that aggregate metrics show as fine; don’t add for engineering convenience and don’t add when the team is already drowning in operational complexity. Wait until pricing or contracts demand it.
- Pricing tiers with commitments. Without them, segmentation is optics not policy.
- Premium complaints with healthy aggregates. Segmentation surfaces what premium experiences vs average.
- Don’t add for convenience. Engineering convenience is not a customer commitment.
- Don’t add when overloaded. Wait until pricing or contracts demand it; the cost is real.