When SLO and SLA Mismatch
Engineering knows SLO; legal commits SLA.
Risk
SLA-versus-SLO mismatches are one of the most damaging organizational failures in reliability practice. Sales or marketing publishes an SLA that engineering cannot actually deliver. The contract is in writing; the engineering team learns about it only when they miss it; the penalties fire and the team's credibility erodes. The fix is alignment up front, not damage control after.
What the risk actually is:
- SLA tighter than SLO.: The customer-facing SLA promises 99.99% availability. The engineering team's internal SLO target is 99.9%. The team is operating to hit 99.9%; the contract requires 99.99%. The mismatch is structural; the contract will be breached every quarter the operations run normally.
- Engineering cannot deliver what was promised.: The team's architecture, staffing, and operational practice were designed for the lower target. Hitting the higher target requires different architecture (multi-region), different staffing (24/7 oncall), different practices (canary, automated rollback). The investment to close the gap is substantial.
- Penalty risk.: Most SLAs include service credits (typically 10 to 50% of monthly fees) for breaches. Repeated breaches across many customers compound into significant financial impact. Worse, repeated breaches damage the company's reputation in the customer's procurement evaluation for years.
- Engineering credibility damaged.: When customers experience the SLA being missed, they blame engineering. When engineering's leadership eventually finds out about the misalignment, they blame sales. The internal acrimony erodes the relationships needed to fix the underlying issue.
- Recovery is expensive.: Once the SLA is published, walking it back requires customer notification, contract amendments, and reputation cost. The contractual unwinding may take years; the credibility cost may take longer. The mismatch is much cheaper to prevent than to recover from.
The risk is real and it is one of the most preventable problems in reliability practice. Alignment before commitment is the discipline.
Align
The fix is making the SLA derive from the SLO, with an explicit buffer between them. Engineering operates against the internal SLO; the customer-facing SLA is looser; the buffer absorbs the variability. This is the structural relationship that keeps the SLA achievable.
- SLA equals SLO times a buffer.: If engineering operates against an internal SLO of 99.95%, the published SLA is 99.9%. The 0.05% buffer is the absorption capacity for routine variability. The SLA is what customers see; the SLO is what engineering hits to comfortably meet the SLA.
- Buffer protects against breach.: Real systems have variability that the SLA cannot tolerate but the engineering operation must. Maintenance windows, dependency outages, regional events. The buffer is what lets the team recover from these without breaching the customer commitment.
- Honest about achievable levels.: The SLA is set at what engineering can credibly defend, not at what marketing wants to claim. If engineering can credibly hit 99.9%, the SLA is 99.9% and the SLO is 99.95%. Setting the SLA at 99.99% when engineering operates at 99.9% guarantees breach.
- Document the relationship.: The internal SLO target, the customer-facing SLA, and the buffer between them are all documented. Future leaders, future engineers, future contract negotiations all reference the same documents. The alignment survives leadership changes.
- Tighten one before the other.: When engineering capability improves, tighten the SLO first. When the SLO has been comfortably met for several quarters, tighten the SLA to match. The progression keeps the buffer healthy; tightening the SLA before the SLO produces immediate breach risk.
The buffer-based alignment is what makes SLA commitments sustainable. Without it, the SLA is wishful; with it, the SLA is grounded.
Review
The third practice is the regular review of SLAs against engineering capability. Annually at minimum, more often if the operating reality is shifting. The review catches the cases where the SLA and the SLO have drifted apart over time.
- Annual SLA review with engineering.: Once a year, the customer-facing SLAs are reviewed against engineering's capacity and trajectory. Are we hitting the SLAs comfortably? Are we missing them? Is the buffer healthy? Are upcoming changes (architecture migrations, dependency shifts) going to affect the relationship?
- No surprises in the review.: The review is informed by the actual SLO performance over the year. Engineering knows whether they hit the SLAs; sales knows whether the team is on track. The review is forward-looking; it does not bring up surprise breaches.
- Cross-functional participation.: Engineering leadership, sales leadership, customer success leadership, and legal all participate. Each has a stake in the alignment; each contributes to the decision about whether to adjust SLAs.
- Adjust SLAs deliberately.: If the review concludes that an SLA needs to change (tighten because performance has improved, or relax because the operating reality has shifted), the change is deliberate, communicated to customers, and coordinated with sales. Quietly adjusting SLAs produces customer trust failures.
- Catch new SLAs before they are signed.: The review process includes a step where new SLA commitments (in customer contracts, in marketing materials, in RFP responses) get engineering sign-off before they are signed. This catches the most common cause of SLA-SLO mismatch: sales committing to numbers engineering cannot deliver.
SLA-SLO alignment is one of those organizational disciplines where the cost of getting it wrong is much higher than the cost of getting it right. Nova AI Ops tracks per-SLA and per-SLO performance, surfaces the cases where the buffer is at risk, and produces the data that makes the annual SLA review decision-ready rather than discovery-oriented.