Pipeline Observability
Watch the pipeline itself.
Metrics
Pipeline duration: per-stage and end-to-end.
Success rate: per-stage and overall.
Top-slow stages: identifies bottlenecks for optimisation.
Alerts
Pipeline broken for more than 1 hour: page. Catches stuck or persistently failing pipelines.
Duration regression: 50% slower than baseline triggers warning. Catches creeping inefficiency.
Per-team success rate dropping. Surfaces team-specific issues.
Traceability
Per-deploy lineage: source commits, artifacts, deploys, environments.
Audit trail: who triggered, when, with what changes.
Cross-pipeline correlation. Failed staging pipelines that should have caught a prod issue.
Why it matters
Pipeline health drives developer velocity. Broken or slow pipelines compound across the team.
Investment pays back in faster ship cycles and earlier-caught regressions.
Quarterly health review. Top issues prioritised; engineering time allocated.