CI/CD for Machine Learning: How MLOps Differs
MLOps is CI/CD with extra stages: data validation, model eval, drift monitoring. Same discipline; broader scope.
Why ML CI/CD differs
MLOps is CI/CD with extra failure surfaces. The pipeline passes data through training, produces models, evaluates them, then deploys; each stage fails in ways traditional CI/CD does not.
- Data is part of the build. Bad data produces bad models; the build artefact depends on the data, not just the code.
- Model is the artefact. Output is a model file plus metadata, not a binary; storage, lineage, and versioning differ.
- Eval is a gate. Models pass against benchmarks before deploy; binary pass/fail is replaced by score thresholds.
- Drift after deploy. Models silently degrade as data shifts; standard CI/CD does not address this at all.
Four extra stages
- 1. Data validation. Schema, distribution, freshness.
- 2. Model training. Reproducible; tracked experiments.
- 3. Model evaluation. Against benchmark + production-like sets.
- 4. Deployment + drift monitoring. Watch for performance degradation.
Tooling per stage
Each stage has its preferred tools. The ecosystem in 2026 is mature; pick by team familiarity, not by hype.
- Validation. Great Expectations, Pandera; assert schema, distribution, freshness as part of the pipeline.
- Training. MLflow, Weights & Biases, Kubeflow; experiment tracking plus reproducible runs.
- Eval. Custom benchmarks plus standard ones; per-team plus industry-standard scores.
- Drift. Evidently, Arize, WhyLabs; production model monitoring with alerting.
Team structure
MLOps requires cross-functional ownership. Models stall between data team and platform team unless ownership spans the whole pipeline.
- Data engineers. Own data validation; pipeline upstream into training is their concern.
- ML engineers. Own training and evaluation; the model lifecycle from training data to deployable artefact.
- Platform engineers. Own deployment, drift monitoring, rollback; the production-side discipline.
- Without ownership span. Models stall between teams; nobody owns the gap; production never benefits from the model.
Antipatterns
- ML deployment via standard CI. Misses validation stages.
- No drift monitoring. Models silently degrade.
- One person owns the full pipeline. Bus factor 1.
What to do this week
Three moves. (1) Apply this to one pipeline first. (2) Measure deploy frequency / MTTR before/after. (3) Document the outcome so the next team starts from data.