Incremental SLO Tightening
Tighten SLOs over time as system matures.
Idea
SLO targets should evolve over time as the service matures and the team's capability improves. The right way to evolve them is incremental tightening: small, frequent steps that the team can credibly meet, rather than aggressive jumps that produce missed targets and damaged credibility. The discipline is patience; the payoff is sustained improvement.
What incremental tightening actually looks like:
- Year 1: 99%.: A new service starts with a modest SLO target. 99% covers the basic operational discipline without requiring sophisticated architecture. The team learns to operate the service; the data accumulates; the baseline establishes.
- Year 2: 99.5%.: After a year of operating at 99%, the team has data and experience. Tightening to 99.5% requires modest investment: better tests, better monitoring, perhaps multi-AZ deployment. The investment is bounded; the new target is achievable.
- Year 3: 99.9%.: Another year of maturation. The team's capability has grown; the architecture has hardened; the operations have stabilized. Tightening to 99.9% is a meaningful step but not a leap.
- Gradual progression.: Each step is what the team can credibly commit to. Aggressive jumps (99% straight to 99.99%) require architectural changes the team has not made; the result is chronic miss and damaged credibility.
- Match the customer expectation.: The pace of tightening also matches what customers expect. Customers who started with a vendor at 99% understand that tightening takes time; they appreciate the maturation. Customers do not expect overnight transformation.
Incremental tightening is the strategy that produces sustained reliability improvement. The team commits to what they can deliver; they deliver; they tighten; they continue.
Verify
Each tightening should be supported by data. The team has been operating against the current target; the data shows whether they have headroom for tighter; the tightening commits to a target the data supports. Aspirational tightening without data produces broken commitments.
- Each tightening data-supported.: Pull the data from the current target's window. What is the team actually delivering? Is there enough margin for a tighter target? The tightening commits to a number that the data backs up.
- Don't aspire blindly.: "We want to be at 99.9% next year" is not a tightening plan; it is a wish. The plan starts with the data; the data informs what next year's target should be. If the data does not support the wished-for target, the plan should account for the investment needed to make it achievable.
- Investment plan accompanies the tightening.: If the tightening requires specific investments (multi-region, additional oncall, deeper testing), the investments are funded as part of the tightening commitment. Tightening the target without funding the investment produces miss.
- Soak before publishing.: Operate against the new internal target for a quarter before publishing it as the customer-facing SLA. The soak verifies the team can actually hit the new target; the publication follows the verification.
- Stakeholder alignment.: Engineering, product, customer success, and leadership all align on the tightening before it happens. Engineering signs off that they can deliver; product and CS communicate to customers; leadership commits to fund the investment. The alignment is what makes the tightening durable.
Verification is the discipline that prevents tightening from becoming wishful thinking. The data gates the commitment; the commitment matches the data.
Avoid
The patterns to avoid are the ones that produce missed targets and damaged credibility. Step-change tightening, aspirational targets without investment, frequent re-targeting, and silent loosening when targets are missed.
- Avoid step changes.: Going from 99% to 99.99% in one quarter is risky. The team has not built the architecture, the operations, or the muscle memory to deliver. The step change produces a target that misses immediately; the credibility cost is real.
- Avoid frequent re-targeting.: Some teams set ambitious targets, miss them, lower them, miss them again, lower them again. The cycle erodes confidence in any of the targets. Stable targets that the team consistently hits are more valuable than aspirational targets that drift.
- Avoid silent loosening.: When a target is consistently missed, the temptation is to quietly relax it. Customers notice; trust drops. The relaxation should be deliberate and communicated, not silent.
- Compound improvements over years.: The team that tightens 0.1% per year for ten years has tightened by 1.0% cumulatively, with sustained credibility throughout. The team that tries to tighten 1.0% in one year and misses has accomplished less. Slow and consistent beats fast and broken.
- Match the architecture's pace.: Architectural changes take time. Multi-region migration. Caching layer addition. Dependency hardening. Each takes quarters or years. The SLO tightening pace matches the architectural pace; tightening faster than the architecture produces gaps.
Incremental SLO tightening is one of those operational disciplines where patience pays back. Nova AI Ops tracks the SLO target trajectory over multiple years, surfaces the cases where the data supports tightening or suggests relaxing, and produces the data that anchors the tightening conversation in evidence rather than aspiration.