SLO Negotiation With Product
Product wants tight SLOs; engineering knows the cost.
Data
SLO negotiation is the conversation between engineering and stakeholders (product, sales, customers, leadership) about what reliability target to commit. The conversation goes well when it is data-driven; it goes poorly when it is opinion-driven. The first move in any negotiation is bringing the data that grounds the conversation in what is actually achievable.
What data-driven negotiation requires:
- Show actual capability.: Pull the historical performance data. Show the actual availability the team has been delivering, the actual latency, the actual error rate. The data tells stakeholders what the team is currently producing; the negotiation builds from this baseline.
- Translate investment to capability.: "We can deliver 99.9% with current investment. Reaching 99.95% requires multi-region architecture, additional 24/7 oncall, automated rollback. The investment is roughly 4 engineer-quarters plus $X infrastructure cost." The conversation shifts from "we want it tighter" to "here is what tighter costs."
- Concrete numbers, not aspirational ones.: Stakeholders sometimes want to commit to numbers that sound good. The data prevents this. "99.99% would require Y investment we have not made; here is the path if you want to fund it." The math is on the table; the decision is informed.
- Per-service capability.: Different services have different achievable SLOs. The data covers each service separately. The composite SLO across services follows from the per-service capabilities; pretending all services can hit the same target is unrealistic.
- Trajectory matters.: The team's capability is improving (or degrading) over time. The data shows the trajectory. A team trending upward can plausibly commit to slightly more aspirational targets; a team trending downward should commit to less. The trend grounds the future commitment.
Data-driven negotiation produces commitments the team can keep. Opinion-driven negotiation produces commitments the team will miss.
Trade-offs
The negotiation has to surface the trade-offs explicitly. Tighter SLOs cost something. Looser SLOs cost something else. Stakeholders need to understand both costs to make an informed decision.
- Tighter SLO equals less feature velocity.: Engineering capacity invested in reliability is engineering capacity not invested in features. The trade is real and quantifiable. If half the team is on reliability work, half is not on feature work; product knows that math when committing to the SLO.
- Looser SLO equals customer experience risk.: A 99% SLO produces 7+ hours of allowed downtime per month. Customers experience the downtime; some churn. The looser target trades reliability for velocity; the cost is customer trust.
- Honest about the math.: The trade-off is not "do we want to be reliable?" (everyone wants reliable). It is "how much velocity are we willing to give up for incremental reliability, and how much reliability are we willing to give up for velocity?" The honest framing produces decisions; the dishonest framing produces unrealistic commitments.
- Quantify both sides.: The cost of reliability (engineering investment) and the cost of unreliability (customer churn, support volume, revenue at risk) both go on the table. The decision is the trade between the two costs, evaluated against the company's strategy.
- Document the trade.: The decision is documented. "We committed to 99.9% rather than 99.95% because the additional reliability investment would have delayed two product launches. We are revisiting this decision in 6 months based on actual performance and customer feedback." The documentation defends the decision later.
The trade-off is uncomfortable to discuss explicitly. The discomfort is the signal that the conversation is real rather than performative.
Compromise
Sometimes the negotiation produces a compromise rather than a single tight or loose target. Different SLO dimensions can move independently; tightening some while relaxing others can match the actual customer needs better than uniform tightening.
- Multi-dimensional SLOs.: Availability, latency, error rate, freshness, correctness. Each is a separate dimension. The team can commit to tight latency (matters most to user experience) while accepting looser availability (matters less for the use case). The combination matches the customer's actual priorities.
- Accept tight latency target; loose availability.: A search service whose users care primarily about response time can commit to 99.9% latency-meets-target with 99.5% availability. The latency commitment is what customers will judge; the availability is set realistically.
- Right shape rather than uniform tightening.: The compromise is shaped to match the workload. A payment service has different priorities than a search service; the SLO shape reflects the priorities. Uniform tightening across dimensions ignores this; per-dimension calibration captures it.
- Communicate the shape to customers.: The published SLA reflects the multi-dimensional commitment. Customers understand what they are getting on each dimension; expectations are calibrated. Vendors who hide behind a single composite number set customers up for surprise.
- Revisit per dimension.: Each dimension's target gets reviewed independently. Latency might tighten as caching improves; availability might tighten as redundancy is added. The dimensions evolve at their own pace; the overall SLO improves piece by piece.
SLO negotiation done with data, honest trade-off discussion, and multi-dimensional compromise produces commitments that survive contact with reality. Nova AI Ops produces the historical capability data, the dimension-by-dimension breakdown, and the trade-off projections that make the negotiation evidence-based.