SLI vs SLO vs SLA: Practical Distinction
Three terms; specific meanings.
SLI
SLI, SLO, and SLA are three related but distinct concepts in reliability engineering. The terms get conflated regularly; the conflation produces confusion in conversations between engineering, product, sales, and customers. Distinguishing them precisely is the foundation for any serious reliability practice.
What an SLI actually is:
- Service Level Indicator.: The measurement itself. The number that tells you how the service is performing on some dimension. Latency, error rate, availability, throughput, freshness. Each is a specific quantity that the metric pipeline produces.
- What you measure.: The SLI is data, not a target. "P99 latency over the past hour was 230ms" is an SLI value. The SLI is the mechanism for knowing what is happening; it has no opinion about whether the value is good or bad.
- Defined per dimension.: A service has multiple SLIs covering different dimensions. Availability SLI: percentage of successful requests. Latency SLI: percentile-based response time. Each is a separate measurement; the combination describes the service's behavior.
- Measured continuously.: SLIs are computed continuously from the metric pipeline. The current value is always available; the time series shows the history. The data itself is the foundation for everything else.
- Tied to user experience.: Good SLI definitions reflect what users actually experience. Server-side error rate that matches client-perceived failure rate is good; server-side error rate that excludes 4xx returns the user saw is bad. The SLI's value depends on its definition matching reality.
The SLI is the input. Without good SLIs, the SLO and SLA are built on guesswork.
SLO
An SLO is a target the team commits to internally. Given an SLI, the SLO is the threshold the team aims to keep the SLI above (or below, for metrics where lower is better). The SLO is engineering's reliability commitment to itself.
- Service Level Objective.: The internal target. "P99 latency below 500 ms, 99.9% of the time, over a 28-day window" is an SLO. The target is specific; the time window is specific; the metric is specific.
- What you commit to internally.: The SLO is engineering's commitment, not the contractual commitment. Engineering operates against this target; burn-rate alerts fire when the target is at risk; deploy gates use it.
- Tighter than the SLA.: The SLO target is tighter than the customer-facing SLA. The buffer between them is the absorption capacity for routine variability. Engineering hits the SLO comfortably to ensure the SLA is met.
- Drives operational decisions.: Deploy freezes, oncall responses, reliability sprints, all key off the SLO. The SLO is the operational lever; it shapes day-to-day engineering behavior.
- Reviewed regularly.: The SLO target gets reviewed quarterly or annually against the actual performance. If the target is consistently met, tighten. If consistently missed, investigate why and either invest more or relax the target.
The SLO is the bridge between the measurement (SLI) and the commitment (SLA). Engineering operates against the SLO; the SLA is the customer-facing reflection of it.
SLA
An SLA is the contractual reliability commitment. The number you publish to customers, with consequences for missing. The SLA has legal weight; missing it has financial and contractual consequences.
- Service Level Agreement.: The contract with customers. "99.9% availability over each calendar month" is an SLA. The terms are written in the customer agreement; the legal team owns the language.
- With penalties for missing.: Most SLAs include service credits when breached. Typically 10 to 50% of the affected month's fees, scaled by the depth of the breach. The credits are the financial consequence; repeated breaches accumulate into significant impact.
- What customers can sue over.: The SLA is a contractual commitment. Customers who do not get the credits they are owed can sue for breach. The SLA is taken seriously by both sides because the contract terms have legal weight.
- Looser than the SLO.: The SLA is looser than engineering's internal SLO. The buffer is the absorption capacity for variability that the SLO target cannot tolerate but the SLA must.
- Published to customers.: The SLA is on the public docs page, in customer contracts, in marketing materials. The number is what customers see and what they reference. Engineering's internal SLO is not published; it is operationalized.
SLI, SLO, and SLA distinguish three different things: what you measure, what you target, what you commit. Each has its own audience, its own time scale, and its own consequences. Mixing them up in conversation is the most common cause of unproductive reliability discussions. Nova AI Ops tracks all three concepts distinctly per service, surfaces them on the dashboards each audience needs, and produces the data that supports each layer of the practice.