SLOs Including Third Parties
SLOs depend on vendors. Account for it.
Calculate
Most modern services depend on third-party vendors: payment processors, email providers, identity vendors, cloud platforms, observability tools. Each vendor has its own SLA. Your service's achievable SLO is bounded by what your vendors deliver. The dependency math is unforgiving and most teams discover it the hard way.
What the math actually says:
- Your SLO is bounded by the vendor SLO.: If you depend on a vendor offering 99.9% availability, your service cannot reliably exceed 99.9% on the request paths that depend on the vendor. The vendor's bad day becomes your bad day; their planned maintenance affects your customers.
- Dependency fraction matters.: If the vendor is on the critical path for 100% of requests, your SLO is fully exposed. If the vendor is on the critical path for 5% of requests (a specific feature that uses them), your exposure is bounded. The architectural question is what fraction of traffic depends on the vendor.
- Math required, not vibes.: Compute the actual number. Vendor SLO times dependent fraction times your own SLO ceiling. The product is the realistic target. Setting a target above the product is overcommitment; below it is underinvestment.
- Multi-vendor compounds.: Service depending on payment vendor (99.95%) AND email vendor (99.9%) AND cloud platform (99.99%) has a composite ceiling around 99.84% before the team's own code adds any failure. The composite is what should anchor the team's SLO target.
- Vendor SLA versus actual delivery.: The published SLA is one number; the actual historical performance is another. Some vendors over-deliver; some under-deliver. Use both: the SLA is the contractual ceiling; the historical is the realistic baseline.
The math is the input to honest SLO target setting. Without it, the team commits to numbers their architecture cannot defend.
Buffer
The architectural response to vendor dependency is to design for the realistic vendor performance, not the published SLA. The vendor's SLA is the ceiling; routine performance is below the ceiling; incidents push performance further below. Your design has to absorb the variance.
- Don't promise more than vendors deliver.: If your vendor's actual delivered availability is 99.85% (despite a 99.9% SLA), your achievable SLO on dependent paths is 99.85%. Committing to 99.9% to your customers means you will miss every quarter the vendor has an incident, which is most quarters.
- Honest about what is achievable.: The marketing pressure is to publish tight SLAs that match the competition. The engineering reality is that vendor dependencies cap the achievable performance. The honest answer is to publish what you can defend; the dishonest answer is to publish what sounds impressive.
- Buffer for vendor incidents.: Accept that some quarters the vendor will breach their SLA. Your SLO target must include enough budget headroom to absorb expected vendor incidents. A 99.9% target with no vendor budget is brittle; a 99.5% target that explicitly includes vendor budget is sustainable.
- Caching reduces dependence.: A cache in front of a vendor call reduces the fraction of traffic that depends on the vendor. Higher cache hit rate means lower dependency, which raises the achievable SLO. The cache architecture is part of the dependency-management strategy.
- Multi-vendor for redundancy.: Some categories support multi-vendor architectures: two payment processors, two email vendors, two SMS providers. The redundancy reduces dependency on any single vendor. The cost is real (multiple integrations to maintain) but the SLO benefit is significant for revenue-critical paths.
The buffer is what makes the SLO honest. Without it, the team is making promises that vendor incidents will routinely break.
Review
Vendor SLAs change. New SLAs get published; old ones get downgraded; vendor performance drifts up or down. Your SLO target needs to track these changes; otherwise it gets stale relative to the dependencies it sits on top of.
- When the vendor changes SLA, yours adjusts.: The vendor moves from 99.95% to 99.9%. Your achievable SLO drops correspondingly. The change should propagate to your SLO target promptly; otherwise you are committing to numbers the vendor no longer supports.
- Tracking required.: The team maintains an inventory of vendor SLAs and tracks changes. Vendors update SLAs in their service agreements; the procurement team or security team flags changes. The inventory feeds into the SLO target review.
- Quarterly review.: Each quarter, review the vendor SLA inventory against the actual vendor performance and against your SLO targets. Misalignments get addressed: tightening your target if vendors are over-delivering; relaxing your target if vendors are under-delivering.
- Renegotiate vendor SLAs.: Some vendor SLAs are negotiable, especially for enterprise contracts. If the vendor's standard 99.9% is not enough for your use case, negotiate 99.95% with corresponding pricing. The contract terms become part of your reliability budget.
- Track vendor incident impact on your SLO.: When a vendor has an incident, attribute the SLO budget consumption to the vendor in your retro. Multi-quarter patterns surface vendors that consistently consume disproportionate budget. The attribution is the data for vendor-replacement decisions.
Third-party SLO management is the discipline of acknowledging that your reliability is partly someone else's reliability. Nova AI Ops tracks vendor SLA inventories, attributes SLO budget consumption to specific vendors, and surfaces the cases where vendor performance is the binding constraint on your committed SLO target.