SLI vs SLO vs SLA: The Three-Letter Acronyms That Actually Matter
SLIs measure, SLOs aim, SLAs bite. Most teams conflate them, and that is why their reliability conversations go in circles.
Definitions, tight
- SLI: Service Level Indicator. A number you measure. E.g., “99.94% of requests returned 2xx/3xx in under 500ms over the last 28 days.”
- SLO: Service Level Objective. An internal target for the SLI. E.g., “SLI will be at least 99.9%.”
- SLA: Service Level Agreement. A contractual promise, usually with refunds for breach. E.g., “If availability drops below 99.5% in a month, customer gets 10% credit.”
SLI: what you measure
Good SLIs are user-facing and measurable with your existing observability stack. Bad SLIs are proxies for a proxy (“CPU usage” as a proxy for “the service is healthy”).
Each SLI should have a clear numerator (successful events) and denominator (eligible events). “99.9% availability” with no definition of either is a conversation, not a measurement.
SLO: what you aim for
An SLO is an internal commitment. It should be set slightly tighter than the SLA, so internal breaches trigger alarms before contractual ones trigger refunds.
Rule of thumb: if your SLA is 99.5%, set your SLO at 99.9%. The gap is your buffer.
SLA: what you promise contractually
An SLA has legal weight. Breach has a cost: usually a credit to the customer, sometimes termination rights in the contract.
Most engineering teams should not be the owners of SLAs, those are commercial/legal terms. The engineering job is to run the system well enough that the SLO stays green, which keeps the SLA green, which keeps finance out of your Slack.
A one-page template
# Service: Checkout API
SLI (success-rate):
Good events: HTTP responses with 2xx/3xx status in <500ms
Valid events: All HTTP requests to /checkout/*
SLO:
Target: 99.9% over 28 days
Breach response: engineering feature-freezes (per error-budget policy)
SLA:
Target: 99.5% over calendar month
Breach response: 10% credit to affected customers, per MSA v4
Every service your company cares about should have one of these. If it does not fit on a page, it is wrong.
SLIs measure, SLOs aim, SLAs bite.
How to disambiguate in a meeting
Anyone who uses the three interchangeably owes the group a specific number. 'Our availability SLA is…' without a number is a sentence that stops the meeting.
The product manager cares about the SLA because customers wrote it into their contracts. The engineering manager cares about the SLO because missing it triggers the error-budget policy. The on-call cares about the SLI because it is the number on the dashboard.
Have the one-page template for every service your company cares about. Send it before the meeting, not during.