SLO Validation: Check Your Math

SLOs based on bad data are misleading.

Data quality

SLO calculations are only as trustworthy as the data they are computed from. The pipeline that produces the SLO numbers can have bugs at any stage: the source metric, the aggregation, the time-window calculation, the rendering. Validating the pipeline end to end is the discipline that makes the SLO numbers themselves trustworthy.

What data quality validation actually requires:

Source metrics correct.: The metric pipeline reads from somewhere: load balancer logs, application instrumentation, synthetic probes. Each source has its own potential for inaccuracy. The validation confirms that the source produces the values the pipeline expects, in the units the pipeline expects, on the cadence the pipeline expects.
Sampling unbiased.: Some metric pipelines sample data rather than capturing every event. A 1% sample produces unbiased aggregates if the sampling is random; biased aggregates if the sampling correlates with anything (time of day, request type, source IP). The bias check confirms the sample is representative.
Cross-check with other sources.: The SLO computed from one source should agree with the SLO computed from another. Server-side error rate should match client-side error rate within sampling tolerance. Database query latency should match application-reported latency. Disagreement points to a bug in one of the sources.
Validate methodology.: The SLO formula. The numerator definition. The denominator definition. The time window. Each is a methodology choice; each gets documented; the validation confirms the documentation matches the code.
Reproducibility.: Anyone can take the documented methodology and reproduce the SLO number from the source data. If they cannot, the methodology is incomplete or the data is hidden. The reproducibility is what makes the SLO defensible to auditors and stakeholders.

Data quality validation is the foundation. Without it, every other SLO discipline operates on potentially-wrong data.

Missing data

The most insidious data quality issue is missing data. The metric pipeline stopped producing values; the SLO calculation continues to read the last good value; the dashboard shows healthy numbers while the system is unhealthy. The validation catches this class of issue.

Stale metrics make SLO look better than reality.: If the metric pipeline froze when the system was healthy, the SLO continues to report healthy. The system might be in a serious incident; the dashboard would not show it. The detection of staleness is itself a critical alert.
Detect by freshness check.: Each metric has an expected update cadence. The validation alerts if the metric has not updated in longer than expected. The alert is independent of the metric's value; it fires on the absence of fresh data.
Flag stale data on the dashboard.: The SLO dashboard explicitly shows when data is stale. Rather than displaying the last good value as if it were current, the display shows "data stale; current SLO unknown." The visualization is honest; it does not invent confidence the data does not support.
Heartbeat metric.: A synthetic heartbeat metric flows continuously through the pipeline. The heartbeat provides a known signal: as long as the heartbeat is updating, the pipeline is alive. If the heartbeat stops, the pipeline is broken regardless of whether application metrics are flowing.
Page on detected staleness.: When staleness is detected, the on-call gets paged. The metric pipeline is critical infrastructure; its failure is treated as a critical incident. Letting the pipeline degrade silently is the worst case.

Missing data detection is the unfashionable side of SLO discipline. Its absence produces dashboards that lie; its presence produces dashboards that tell the truth.

Anomalies

The third validation category is anomaly handling. Real-world metric data has anomalies: a single instance of extreme latency, a single second of zero traffic, a brief data ingestion gap. Each can distort the aggregate SLO calculation if handled naively.

Outliers can dominate aggregate.: A single 60-second response time mixed in with thousands of 100ms responses skews the average. P99 and other percentile metrics are designed to handle this, but the discipline still applies: the validation confirms that outliers are being handled appropriately.
Validate distribution shape.: The expected shape of the latency distribution: right-skewed with a long tail. Anomalies violate this shape. A bimodal distribution where most requests are fast and a separate cluster is very slow indicates two distinct populations; the SLO calculation should reflect them.
Document anomaly handling.: The methodology specifies how anomalies are handled: clipped to a max value, excluded from the calculation, included with attribution. Each choice has consequences; the documentation makes them explicit.
Investigate large divergences.: When the validation flags an anomaly, investigate whether it represents a real issue (a real bug producing real bad performance) or a measurement artifact (a clock skew, a network blip in the metric pipeline). Each has a different response.
Anomaly-aware aggregates.: Where possible, the SLO calculation uses methods robust to anomalies (medians, trimmed means, percentiles) rather than methods sensitive to outliers (raw means). The robust methods produce more reliable signals.

SLO validation is the quality discipline that makes SLO numbers trustworthy. Nova AI Ops runs continuous validation across the SLO pipeline, surfaces the data quality issues, and produces the audit artifacts that confirm the SLO calculation is reliable rather than approximated.