Defining SLIs for Data Pipelines

Pipeline SLIs differ from request-response SLIs. The three dimensions that matter, the metric definitions, and the alerting that catches drift.

Freshness

SLIs (service-level indicators) for data pipelines are different from SLIs for online services. Pipelines are not request-driven; their outputs are datasets, not responses. The relevant dimensions are freshness, completeness, and correctness. Each dimension produces an SLI that the pipeline team can target and report against.

What freshness measures:

How old is the data the pipeline produced?: Freshness measures the lag between when data was generated upstream and when the pipeline made it available downstream. The metric matters for any downstream consumer that needs recent data.
Freshness SLI: 95% of partitions arrive within 30 minutes of source.: The specific target depends on the pipeline. 30 minutes is a typical real-time-adjacent target; some pipelines target seconds, others target hours. The SLI is specific and measurable.
Critical for downstream that depends on recency.: Dashboards, real-time analytics, fraud detection, and similar downstream uses depend on freshness. Stale data in these contexts is the same as wrong data; the freshness SLI captures this.
Per-partition or per-batch.: Freshness is measured at the partition or batch level. The SLI is a percentage: what fraction of partitions met the freshness target. The percentage is the SLI.
Trend over time.: The freshness trend matters as much as the point-in-time value. Improving freshness indicates the pipeline is becoming more reliable; degrading freshness is a signal worth investigating.

Freshness is the most user-visible pipeline SLI. Downstream consumers feel the freshness directly.

Completeness

Completeness measures whether the pipeline processed all the data it was supposed to. A pipeline that runs successfully but processed only 80% of the expected records has a completeness problem. The SLI captures this.

Did the pipeline process all expected records?: The pipeline's input volume is known (or can be estimated). The output volume is observed. The ratio produces a completeness measure.
Completeness SLI: 99% of expected daily volume.: The target is high (99%) because gaps in completeness usually indicate real problems. Some loss is acceptable (occasional bad records); large gaps need investigation.
Drops indicate upstream issues.: A drop in completeness might mean the upstream produced less data, or the pipeline lost data. Both warrant investigation; both need fixing.
Transformation bugs.: A bug in the transformation might silently drop records. The completeness SLI catches this; without it, the bug might persist for weeks before notice.
Schema mismatches.: Schema changes upstream that the pipeline does not handle correctly produce dropped records. The completeness SLI catches the drops; the team investigates and updates the pipeline.

Completeness is the data-integrity SLI. Without it, partial pipeline failures masquerade as success.

Correctness

Correctness is the hardest pipeline SLI to measure. It requires comparing the pipeline's output to ground truth; the ground truth is not always available. Sample-based verification is the practical compromise.

Sample-based.: A small random sample of inputs is processed both by the pipeline and by an independent reference (manual computation, alternative implementation, golden dataset). The outputs are compared.
Pick a small subset; verify outputs match expected.: The sample size is calibrated for statistical confidence. A few hundred samples produce useful confidence; the cost of processing samples manually or independently is bounded.
Correctness SLI: 99% sample agreement.: The target is high. 99% agreement means the pipeline produces the right output for at least 99% of cases. Below that indicates a real correctness issue.
Hardest to measure.: The independent reference is the challenge. Generating it requires either manual work, an alternative implementation, or a golden dataset. The investment is significant; the SLI is correspondingly valuable.
Worth it for high-stakes pipelines.: Pipelines that produce regulatory reports, financial data, or customer-facing analytics warrant the correctness investment. The cost of wrong outputs in these contexts dwarfs the cost of measuring correctness.

SLI for pipelines is a discipline that brings the rigor of online-service SLOs to data pipelines. Nova AI Ops integrates with pipeline orchestrators and data quality tools, surfaces freshness, completeness, and correctness trends, and produces the per-pipeline SLO report that the data engineering team uses to drive quality.