SLO Baseline Shift Detection
Baseline drifts; SLO becomes meaningless.
Detect
The SLO target was set against a specific baseline at a specific moment. The system has not stayed still. Code has shipped, dependencies have changed, traffic has shifted. The current operating reality might be very different from what the baseline captured. If the team does not detect baseline shift, the SLO target progressively becomes meaningless: either trivially easy (the system improved) or impossibly hard (the system degraded), and the dashboard stops being a useful signal.
How to detect baseline shift in practice:
- Compare current performance to original baseline.: Pull the trailing 90 days of SLO performance and compare to the data that was used to set the original target. If the median performance has moved by more than half the gap to the target, the baseline has shifted significantly enough to require recalibration.
- Significant change equals recalibrate.: A small drift (the operation has consistently been 10% better or worse than baseline expected) is normal noise. A large drift (consistently 50% better or 100% worse) indicates the underlying system has changed. The recalibration is the structural response.
- Per-dimension shift detection.: Latency might have shifted while availability stayed stable. Or freshness might have degraded while latency improved. The shift detection runs per SLI dimension, not just on the composite. Each shift surfaces what changed in the underlying system.
- Statistical tests for noise versus signal.: The shift detector uses statistical methods (Mann-Whitney U for distribution shifts, two-proportion tests for rate changes) to confirm that the apparent shift is real rather than noise. Without this, false alarms desensitize the team to real shifts.
- Cohort-aware detection.: Some shifts affect only specific cohorts (a region change that affects EU customers, a traffic-mix change that affects free-tier users). The detection looks at cohorts separately so a localized shift is not hidden by an aggregate that averages out.
The detection is the cheap part. Most teams can implement it as a quarterly cron job that compares last quarter's data to the baseline and emits a report. The discipline is doing it; the implementation is straightforward.
Respond
Once a baseline shift is detected, the team has to decide what to do. Ignoring the shift is the worst option; it is also the most common. The right response depends on which way the shift went and what is causing it.
- Tighten the SLO if performance improved.: If the system has been consistently outperforming the target, the target is too loose. Tightening it captures the new operating reality and ensures the SLO continues to drive engineering investment. The previous target becomes the floor; the new target is the new aim.
- Accept new baseline if performance degraded due to permanent change.: If the system genuinely cannot achieve the original target because of architectural changes, scale changes, or dependency degradation, accept the new baseline by relaxing the target. Pretending the original target is still achievable produces chronic SLO miss without changing the underlying operation.
- Investigate the cause before recalibrating.: A shift is information about the system. Before tightening or relaxing, the team investigates why the shift happened. Sometimes the cause is a fixable regression (which should be fixed, not accommodated). Sometimes it is a structural change (which should be accommodated, not fought).
- Don't ignore.: The worst response is to leave the SLO at its original target while the system has materially shifted. The dashboard tells one story; reality tells another. The team learns that the SLO is not actually a serious commitment, and the practice unwinds.
- Communicate the change.: When the SLO target changes, document why and notify stakeholders. Customers depending on the published number may need to be told. Internal teams may need to update their dependencies' assumptions. The change is deliberate, not silent.
Responding to baseline shift is what keeps the SLO practice honest over years. Teams that recalibrate when needed produce SLOs that match operational reality; teams that ignore shifts produce SLOs that do not.
Review cadence
Baseline shift detection is too important to be done ad hoc. The discipline is a regular review on a fixed cadence, with the same rigor each time. The cadence catches shifts before they become large enough to be embarrassing.
- Quarterly review.: Once a quarter, the team runs the baseline shift analysis as part of the SLO review. The output is a list of services where the baseline has shifted enough to warrant action. Each entry gets discussed; each action is documented.
- Catches drift.: Quarterly is the right cadence because shifts that develop within a quarter are usually small enough to ignore, while shifts that develop across multiple quarters need to be caught before the gap widens. Quarterly hits the sweet spot.
- Tied to the broader SLO review.: The baseline shift check is part of the standard quarterly SLO review, not a separate meeting. The SLO performance review naturally includes the question "is the target still right." The combined meeting is shorter and more focused than two separate ones.
- Document the analysis.: The output of each review is captured: which services had shifts, what the shift was, what the team decided, why. The history of these decisions becomes the institutional knowledge of how the SLO targets have evolved.
- Annual deeper review.: Once a year, the deeper review re-baselines from scratch: pull a fresh 90 days of data, run the same baseline analysis that was done originally, decide whether the current targets still match. Annual is the budget for the deeper work; quarterly is the budget for the lighter check.
SLO baseline shift detection is one of those quiet disciplines that distinguishes mature reliability practices from immature ones. Nova AI Ops runs the baseline-shift analysis automatically per service, surfaces the shifts that exceed configurable thresholds, and produces the per-quarter report that the team can use as the input to the SLO review meeting.