SLO for New Services
First 3 months: shadow SLO.
Shadow mode
The temptation when launching a new service is to publish an aggressive SLO target on day one. The team commits to 99.9% before a single real user has hit the service. Three months later, when the actual operating reality is closer to 98%, the team is missing target every month and the customer-facing commitment is at risk. The fix is shadow mode: track the SLO from day one, but do not enforce it until you have data.
What shadow mode actually involves:
- Track SLO from launch.: The metric pipeline is wired up at the moment the service launches. Latency histograms, error rates, success counters, freshness tracking. The data accumulates from the first request, even though no SLO target is yet committed.
- Don't enforce yet.: No burn-rate alerts, no deploy freezes, no error budget policy in effect. Misses do not trigger consequences. The mode is observation only.
- Build the data.: The shadow period (typically 60 to 90 days) accumulates the operational data needed to set an honest target. Weekly traffic patterns, time-of-day variations, dependency outage frequency, the cost of routine maintenance windows. None of this is visible in design docs; it surfaces in the data.
- Internal dashboard, not customer-facing.: The SLO performance is visible to the engineering team but not to customers. The dashboard tracks the running median, the worst-case window, the dimensional breakdown. The team learns what their service actually does.
- Iterate on the SLI definitions.: Shadow mode is where you learn whether your SLI definitions match user experience. A latency target that included redirects when it should not. An availability calculation that excludes maintenance windows when customers were actually affected. The corrections happen in shadow before any commitment is on the line.
Shadow mode is the discipline that prevents the most common new-service mistake: publishing an aspirational SLO that the operating reality cannot support. Three months of patient observation is the cheapest form of insurance.
Validate
After the shadow period, the team has real data. The next move is using that data to set the initial SLO target honestly. The data is the input; the target is the output of the analysis.
- After 90 days, set initial target.: 90 days is enough to capture seasonality, dependency variation, and the operational learning curve. Pull the data, run the baseline analysis, identify the realistic target range. The first committed SLO emerges from this analysis, not from a marketing meeting.
- Honest target, not aspirational.: The target reflects what the service has actually been doing, with a small stretch (10 to 20% improvement on the achieved baseline). Setting the target at the achieved baseline locks in the current performance; setting it well above produces chronic miss. The middle is a real stretch the team can credibly defend.
- Per-dimension targets.: Set targets for availability, latency, error rate, and freshness separately, each based on its own 90-day data. The composite SLO follows from the per-dimension targets. Setting the composite first and decomposing is harder and produces less defensible numbers.
- Document the target rationale.: Write up why this is the target: what the data showed, what the dependencies imply, what the architectural ceiling is. The rationale survives leadership changes and helps future conversations about adjusting the target.
- Plan for revisit.: The first SLO target is a hypothesis. It will be revisited at the first quarterly review, with the additional data accumulated by then. The team should expect to adjust within the first year as the system matures.
The validation step turns shadow data into a real commitment. The team that does this carefully ships SLOs they can defend; the team that skips it ends up walking back targets within a quarter.
Communicate
Throughout the shadow period and after the validation, the communication discipline matters. Customers who were told the service has an SLO when it actually has shadow tracking are misled; customers who are told nothing assume the worst. The right framing is honest about where the practice is.
- Stakeholders know it is shadow.: Internal stakeholders (sales, customer success, leadership) know that the new service is in shadow SLO mode. They are not pitching availability numbers to customers. They are positioning the service as new and continuing to evolve.
- No false promises.: The marketing materials, the SLA page, and the public docs do not commit to specific reliability numbers during shadow. "Available with best-effort reliability during the early access period" or similar. The commitment is honest about the maturity level.
- Clear transition signal.: When the service moves from shadow to GA, the SLO commitment becomes public. Status page is added. SLA page updates. Customer comms acknowledge the new commitment. The transition is a deliberate event, not a quiet rollover.
- Customer expectations set explicitly.: Early customers using the service during shadow are told what to expect: "We're targeting 99% availability internally during the early access period; we expect to commit to a published number after 90 days of operating data." Honesty wins customers more than overcommitment ever does.
- Document the journey.: The transition from shadow to GA SLO is captured in writing: when shadow started, what data was used to set the initial target, what the target is now, when the next review will be. The documentation defends the decision later when someone asks how the target was chosen.
SLO shadow mode for new services is the cheapest reliability investment a team can make at launch. Nova AI Ops supports shadow SLO tracking from day one, generates the baseline analysis when the data is sufficient, and helps the team transition cleanly from shadow to committed SLO with the audit trail intact.