SLO Deprecation: Retire Old SLOs
Stale SLOs mislead.
Trigger
Most teams have a process for adding SLOs and no process for retiring them. The result, over years of operation, is a dashboard cluttered with SLOs against services that have been retired, use cases that have changed, or assumptions that no longer hold. The dashboard becomes harder to trust because some of the numbers are tracking realities that no longer exist. The fix is deliberate SLO deprecation triggered by specific events.
What should trigger an SLO retirement:
- Service retired.: When a service is decommissioned, its SLO retires with it. The dashboard should not have a tile for a service that no longer exists. Stale tiles confuse operators and create false signals (an SLO showing 0% availability because the service is gone, not because it is broken).
- Use case changed.: The SLO that was right for the old use case may not be right for the new one. A service that was customer-facing and got moved internal-only has different SLO requirements. The previous SLO target needs to be either retired or repositioned for the new audience.
- Baseline shifted significantly.: When the underlying behavior of the service changes enough that the original baseline no longer applies (a major architecture change, a 10x scale increase, a new region launch), the SLO target needs to be re-baselined. The old target is retired; a new one is set.
- Composite SLO supersedes individual ones.: When multiple per-dimension SLOs are merged into a composite or vice versa, the predecessors retire. Carrying both the old and the new in parallel produces double-counting and dashboard confusion.
- Each trigger is a flag for retirement.: The team should have a list of triggers and a habit of asking "did any of these happen?" during the quarterly SLO review. The retirement is deliberate, not implicit.
The trigger discipline is what keeps the SLO dashboard reflecting current reality. Without it, the dashboard accumulates obsolete SLOs and the team stops trusting any of the numbers.
Process
Retiring an SLO is a small process that is worth doing carefully. The process is what makes the retirement durable: documented, communicated, and audit-trailed so that nobody is confused later about why the SLO disappeared.
- Document retirement.: A written record of the retirement: which SLO is being retired, why, when, what is replacing it (if anything), where the historical data is archived. The document goes in the same place SLO definitions live (the team wiki, the SLO platform, the source-controlled config).
- Communicate to stakeholders.: Customers depending on the published SLA need notice. Internal stakeholders depending on the dashboard need to know the tile is going away. Compliance teams need the audit record. The communication is a small mailing-list message, not a major announcement, but it has to happen.
- Audit trail of the decision.: The decision to retire is captured: who proposed it, who approved it, the reasoning, the date. The audit trail is what auditors want when they ask "why did this control disappear" two years later. The trail also defends against the case where someone questions the retirement after the fact.
- Archive the historical data.: The historical performance against the retired SLO is archived, not deleted. Future investigations may need the data. The archive is read-only and clearly labeled "retired SLO; performance against historical target."
- Update related artifacts.: Runbooks that referenced the retired SLO get updated. Alerts that fired on its burn rate get retired. Dashboards that displayed it get cleaned up. Each artifact has its own lifecycle; the SLO retirement triggers the corresponding cleanups.
The process is small: a documentation step, a communication step, and a cleanup step. Done routinely, it keeps the SLO catalog clean. Skipped, it produces the long-term clutter that makes the practice harder to defend.
Avoid
Stale SLOs are worse than no SLO at all. The dashboard with three retired SLOs still being tracked produces confusion and gradually erodes the team's trust in every number on the dashboard. The discipline is to retire SLOs as deliberately as you create them.
- Stale SLOs lie.: An SLO that no longer reflects reality produces numbers that are not meaningful. The team checks the dashboard, sees a green light, and is reassured by a signal that has nothing to do with current operational state. False reassurance is dangerous.
- Lying numbers compound.: Once one SLO is unreliable, the team starts treating all SLOs as advisory. The discipline that was hard to build (trust the dashboard, react to its signals) becomes easier to abandon. Stale numbers slowly poison the practice.
- Worse than no SLO.: A team without an SLO knows they do not have one. A team with a stale SLO thinks they have a working signal when they do not. The latter is worse, because the false confidence prevents the team from doing the work to set a real one.
- Avoid retiring without retiring.: The wrong fix is to leave the SLO in place but stop paying attention. The right fix is to remove it from the dashboard, archive the data, and document the retirement. Mute-and-ignore is not a process; it is the path to clutter.
- Avoid keeping SLOs because of compliance theatre.: Some teams keep retired SLOs because "auditors might ask." If auditors ask, "we retired this SLO on date X for reason Y, here is the documentation" is the correct answer. Keeping a stale SLO is not better.
SLO deprecation is the unglamorous discipline that keeps the SLO practice clean over years. Nova AI Ops tracks SLO definitions per service, surfaces the cases where an SLO has not been refreshed in over a year, and flags the candidates for retirement so the dashboard reflects current reality rather than accumulated history.