SLO & Reliability Practical By Samson Tanimawo, PhD Published Jun 15, 2025 4 min read

SLO Coverage Rate

What % of services have SLOs?

Target

The most useful question in a reliability practice is not "what is our SLO" but "how many of our services even have one." Coverage rate, the percentage of services with a defined and measured SLO, is the meta-metric that tells you whether the practice is real or aspirational. A team can have brilliant SLOs on its three flagship services and zero on the other 47, and the average user experience is dictated by the 47 that nobody is watching.

The coverage targets that hold up:

100% of customer-facing services.: Anything a paying customer can hit needs an SLO. No exceptions. If a service is in the request path of a logged-in user, it has a contract whether you wrote one down or not. Writing it down is what gives you a chance to keep it.
100% of revenue-path internals.: The internal services that sit behind the customer-facing ones (auth, payments, data plane, billing) are equally critical. They do not face customers directly, but their failure cascades into customer impact instantly. SLO them.
80%+ of internal platform services.: Services that other engineering teams depend on (build infrastructure, internal APIs, dev environments) need SLOs because they multiply into productivity loss when they fail. The bar is lower than customer-facing but it is not zero.
Best-effort on everything else.: Batch jobs, scheduled reports, internal dashboards. Either they have an SLO or an explicit "no SLO, best effort" tag, but the absence of an SLO is documented, not just neglected.

The aspirational version is 100% coverage everywhere. The realistic version is high coverage on the tiers that matter and explicit no-SLO labels on the rest. Either way the gap is visible.

Track

Coverage only improves if you measure it. The mechanism is dead simple and most teams skip it anyway: a service catalog, a column for "has SLO," and a quarterly review of the gaps.

Quarterly audit of the catalog.: Every service in the registry gets reviewed once a quarter for SLO presence. The output is a list of services without one, owners assigned, target deadline for fixing. The audit takes hours, not days, if you have a service catalog.
Surface gaps in the dashboard.: The reliability dashboard should have a coverage tile next to the SLO tiles. "47 of 52 customer-facing services have SLOs (90% coverage)." This is uncomfortable, which is the point. The five missing services get a name and an owner.
Track new services from day one.: Every service launch checklist includes "SLO defined." If a service ships without one, it is a process bug, not a "we'll get to it." The cheapest time to define an SLO is before the service has users.
Track decommission.: When a service retires, the SLO retires with it. Stale SLOs on dead services pollute the dashboard and create the impression of poor coverage where the issue is actually outdated catalog data.

What gets measured gets defined. The coverage number itself is the lever. Once it is on the dashboard, the gap closes faster than anyone expects.

Compound

The compounding effect of high coverage is the real prize. A reliability practice with 90% SLO coverage is qualitatively different from one with 20%, even when the SLOs themselves are identical.

Dependency math becomes possible.: When most of your services have SLOs, you can compute composite reliability across a request path. Without coverage, every dependency is an unknown and every analysis is guesswork.
Trade-offs get specific.: "We can ship feature X this quarter, but it will likely consume 30% of the SLO budget on services Y and Z" is a conversation you can have with product. Without SLOs, the answer is always "we think it will be fine," which means everybody finds out together.
Year-over-year trends emerge.: When you can chart coverage at 65%, then 78%, then 89% across consecutive quarters, the practice has visible momentum. New hires see a culture moving the right direction. Leadership has something concrete to point at when justifying reliability investment.
Healthy operational signal.: Coverage is the leading indicator. SLO performance is the lagging one. A team that is closing coverage gaps is going to ship better reliability next quarter even if this quarter's numbers look identical to last.

SLO coverage is the cheapest reliability investment a team can make and the one most teams forget to make. Nova AI Ops auto-discovers services, flags the ones without SLOs, surfaces the coverage rate as a first-class metric, and lets you track quarter-over-quarter movement so you can see the practice maturing instead of guessing whether it is.