The Multi-Agent OS for SRE & DevOps

CI/CD: Continuous Integration and Delivery Explained (2026 Guide)

The pipeline is the backbone of modern software delivery: the machine that turns a commit into a verified, running release. This is the focused 2026 guide to CI/CD pipelines specifically. What continuous integration, delivery, and deployment really mean, the anatomy of a pipeline stage by stage, the deployment strategies that ship without downtime, the DORA metrics that grade pipeline quality, how pipelines meet reliability and AI, a 10-point checklist, and a 90-day plan.

17 min read Published May 2026 By Dr. Samson Tanimawo, Nova AI Ops
CI/CD pipeline diagram showing source, build, test, security scan, artifact, deploy, and verify stages feeding into automated rollback and observability

What CI/CD is: the three things people conflate

CI/CD is the automated pipeline that takes a code change from a commit all the way to running, verified, in production. The two letters hide three distinct practices that get conflated constantly, and the confusion is not pedantic: each one implies a different level of automation, a different risk posture, and a different organizational commitment. Naming them precisely is the first step to building or maturing a pipeline that actually fits your team.

Continuous integration is the practice of merging every developer's work into a shared main branch many times a day, and proving each merge with an automated build and an automated test run. The goal is to catch integration problems within minutes of the commit that caused them, while the change is small. Continuous delivery extends that: the tested artifact is always kept in a deployable state, so any green build can be released to production with a single click whenever a human decides to. Continuous deployment goes one further and removes the human gate entirely. Every change that passes the full pipeline ships to production automatically, with no manual approval.

The first CD, delivery, is about always being ready to ship. The second CD, deployment, is about automatically shipping. They share an abbreviation, which is exactly why people use them interchangeably when they mean very different things. Almost every team should pursue continuous delivery. Whether to go all the way to continuous deployment is a deliberate, situation-specific choice covered later on this page.

Why does the pipeline matter so much? Because it is the single path every change travels on its way to your users. Make that path fast, automated, and safe, and you raise both your delivery speed and your reliability at the same time. Leave it slow and manual, and every release becomes a risky, stressful event. CI/CD is one focused part of the wider DevOps automation surface, and it sits inside the broader DevOps culture of shared ownership between the people who build software and the people who run it. This page is the deep dive on the pipeline itself.

Continuous integration in depth

Continuous integration is the foundation, and it is more a discipline than a tool. The name comes from a simple, hard-won lesson: integrating code is painful in proportion to how long you wait to do it. If ten engineers each work in isolation for two weeks and then merge, the conflicts, the incompatible assumptions, and the broken interfaces all surface at once in a miserable big-bang integration. If instead each engineer merges to main several times a day, every integration is tiny, and any problem is caught within minutes while its author still has full context.

Trunk-based development

The branching model that makes continuous integration real is trunk-based development. Everyone commits to short-lived branches cut from main, ideally living less than a day, and merges back fast. There are no long-running feature branches that drift for weeks. The opposite pattern, where each feature gets a branch that lives until the feature is fully finished, quietly defeats continuous integration: the team is technically using a CI server, but the actual integration of divergent work still happens rarely and painfully. Short-lived branches plus frequent merges are what the practice requires.

Automated builds and fast test feedback

Every merge to main triggers an automated build and an automated test run. The single most important property of that test suite is speed. If the feedback takes forty minutes, engineers context-switch away, stop watching, and start batching changes to avoid the wait, which erodes the whole discipline. The aim is fast feedback measured in minutes: run the quick unit tests first and fail loudly the moment anything breaks, then run the slower integration and end-to-end suites. Parallelizing test execution and running only the tests affected by a change are the usual levers for keeping feedback fast as the codebase grows.

Keep main green

The cultural rule that ties it together is keep main green. A broken build on main is the team's top priority, ahead of any new feature work, because while main is red nobody can safely integrate on top of it and the whole team is blocked. Mature teams enforce this with branch protection that refuses to merge a change unless the pipeline passes, so a red main becomes rare by construction rather than by heroics. The payoff of high merge frequency on a green main is that integration stops being an event and becomes a continuous, boring, low-risk background hum, which is exactly the goal.

The common failure mode. Many teams say they "do CI" because a pipeline runs on every pull request, while branches still live for a week or two before merging. That is automated testing, which is valuable, but it is not continuous integration. The integration is still infrequent and still risky. The tell is branch lifetime: if your typical branch lives longer than a day, you have a build server, not continuous integration.

Continuous delivery vs continuous deployment

Both abbreviate to CD, and the only structural difference between them is a single gate: whether a human presses the button before production. That one gate, present or absent, changes the operating model significantly.

Dimension Continuous delivery Continuous deployment
Production gateHuman approves the releaseNo gate; passing change ships automatically
Release cadenceOn demand, when the team choosesEvery green commit, continuously
Test coverage requiredHighVery high; the tests are the only gate
Rollback maturityRecommendedMandatory and fast
Best fitRegulated, change-window, high-blast-radiusDeep automation, progressive delivery in place
Audit and change controlExplicit human sign-off per releaseEncoded in the pipeline and policy

When continuous delivery is the right call. If you operate in a regulated environment that requires a human sign-off on each production change, if releases must land inside an approved change window, or if a single bad deploy has a very large blast radius, the human gate is a feature, not a weakness. Continuous delivery gives you the speed of an always-ready pipeline while keeping the deliberate choice of when to release in human hands.

When continuous deployment is the right call. If you have deep automated test coverage, fast and reliable rollback, and progressive-delivery safety nets so that a bad change reaches very few users before it is caught, removing the human gate removes a bottleneck without adding meaningful risk. The pipeline becomes the gate, and it is a more consistent gatekeeper than a tired human at the end of a sprint.

Release is not the same as deploy

A crucial idea that unlocks both models is that deploying and releasing are different events. Deploy means the new code is running on production servers. Release means users can actually see and use the new behavior. Feature flags separate the two: you deploy the code dark, with the new feature wrapped in a flag that is switched off, so it runs in production but is invisible to users. Later you release it by flipping the flag on, for everyone or for a percentage, with no redeploy. This separation lets even a continuous-deployment team ship code to production constantly while still controlling, carefully and reversibly, when each user-facing change actually goes live.

See how a deploy is watched, verified, and auto-rolled-back across your whole fleet.

Try Nova →

Anatomy of a pipeline, stage by stage

A mature CI/CD pipeline is a sequence of stages, each of which is a gate: the change only advances if the stage passes. Below is the canonical shape. Real pipelines add, reorder, or parallelize stages, but the logical flow from commit to verified release is consistent across teams.

1Source

A commit or a merge to a watched branch triggers the run. The trigger event, the exact commit, and the person who pushed it are all captured so the run is fully traceable. This is also where pull-request checks run before code ever reaches main.

2Build

The code is compiled and packaged into a deployable artifact: a binary, a container image, or a bundle. The build must be reproducible, so the same commit always produces the same artifact. Caching dependencies keeps this stage fast as the project grows.

3Test

Unit tests run first for fast feedback, then integration tests that exercise components together, then end-to-end tests that drive the system the way a user would. Ordering fast tests first means most failures surface in seconds, not after the slow suites finish.

4Security scanning

Static analysis of the code, dependency scanning for known vulnerable libraries, container image scanning, and secret detection. These gates shift security left, catching issues in the pipeline rather than in a production incident or an audit months later.

5Artifact

The verified build is published to a registry with an immutable, versioned identity. The exact artifact that passed the tests is the artifact that gets deployed, with no rebuild in between, so what you tested is precisely what you ship.

6Deploy and verify

The artifact rolls out to staging and then production using a chosen strategy. Post-deploy health checks and observability confirm the release is healthy, and an automated rollback fires if it is not. The deploy is not done until the verify stage says the system is healthy.

Pipeline as code

The pipeline itself lives as code in the repository, alongside the application it builds. Defining the pipeline declaratively, rather than clicking it together in a web UI, gives it the same benefits the application code already has: it is version-controlled, reviewed in pull requests, diffable, and rolled back by reverting a commit. It also makes the pipeline reproducible across projects and recoverable if the CI system is rebuilt. A pipeline you cannot review in a diff is a pipeline nobody fully understands. Keeping it as code is what makes the whole delivery process auditable and maintainable.

Deployment strategies and fast rollback

How you push the new artifact into production is where the pipeline meets reliability directly. The goal is to deploy without downtime and to make rolling back fast and boring. Four strategies cover almost every case, and mature teams combine them.

Rolling deployment

Instances are replaced in batches rather than all at once, so the service stays available throughout. During the rollout the old and new versions run side by side, which keeps capacity up but requires that the two versions be compatible, especially around database schemas and API contracts. Rolling is the default in most container orchestrators because it is simple and needs no extra infrastructure, but its rollback is slower because it has to roll the batches back the way it rolled them forward.

Blue/green

Two identical production environments exist: the live one (blue) and an idle one (green). You deploy the new version to green, verify it in isolation, then flip all traffic from blue to green at once. The rollback is the fastest of any strategy because it is just flipping traffic back to blue, which is still running the previous version untouched. The cost is running two full environments, and care is needed so that in-flight requests and shared state, such as the database, are handled cleanly across the switch. This is the model the Nova marketing site itself uses for deploys.

Canary

The new version is released to a small slice of traffic first, perhaps one percent, while its metrics are watched closely. If error rates, latency, and the key business signals stay healthy, the rollout is promoted in steps to larger and larger shares until it serves everyone. If the metrics degrade, the canary is pulled and the blast radius is limited to that initial slice. Canary is the safest strategy for high-risk changes because the system itself, ideally automatically, decides whether to promote or abort based on real production signals.

Feature flags

Feature flags operate at a different layer: they decouple deploy from release entirely. The code ships dark behind a flag, and you turn the feature on with a configuration change, for everyone or a percentage of users, with no redeploy. Rolling back a feature is flipping its flag off, which takes seconds and touches no infrastructure. Most mature teams combine canary or blue/green for the infrastructure-level deploy with feature flags for the product-level release, getting fast, low-risk control at both layers.

The unifying theme across all four is that fast rollback is the real safety net. A pipeline that ships often is only safe if a bad change can be reversed quickly, which is why automated rollback wired to health checks turns a regression from an outage into a non-event. This is the hinge that connects delivery speed to site reliability engineering, and it is where MTTR is won or lost.

CI/CD metrics and the four DORA keys

You cannot improve a pipeline you do not measure, and the most widely adopted measurement framework is DORA, which defines four key metrics for software delivery performance. The elegant thing about the four keys is that they balance throughput against stability, so you cannot game one by sacrificing the others.

DORA metric What it measures Elite target
Deployment frequencyHow often you ship to productionOn demand, multiple per day
Lead time for changesCommit to running in productionLess than one hour
Change failure rateShare of deploys that cause a degradationUnder 15%
Failed deployment recoveryTime to restore service after a bad changeLess than one hour

The first two keys, deployment frequency and lead time, measure throughput: how fast and how often you can deliver change. The second two, change failure rate and failed deployment recovery time (often expressed as MTTR), measure stability: how often your changes break things and how fast you recover when they do. The research behind DORA found that high performers score well on both pairs at once. Speed and safety are not a trade-off in a well-built pipeline; they rise together, because the same practices that make deploys frequent, small batches, strong automation, and fast rollback, are exactly what make deploys safe.

This is why pipeline quality maps so directly onto the four keys. A slow, manual pipeline lengthens lead time and depresses deployment frequency. A pipeline with weak tests and no progressive delivery raises change failure rate. A pipeline with no automated rollback lengthens recovery time. Improving the pipeline improves all four metrics, which is the most legible way to demonstrate the value of investing in CI/CD to leadership.

CI/CD meets reliability and AI

Here is the uncomfortable truth that connects the pipeline to operations: change is the single largest cause of production incidents. The pipeline is the thing that ships change. So the pipeline is also, indirectly, the thing that ships most of your incidents. A faster pipeline with no safety net does not make you more reliable; it just makes you break things faster. The reliability win comes from pairing pipeline speed with the safety mechanisms that catch a bad change before it reaches everyone.

The loop that closes the gap has three parts. Progressive delivery, canary or feature flags, ensures a regression touches a small slice of users first. Automated rollback, wired to health checks, reverses a bad deploy in seconds without a human deciding to. And observability gives the deploy something to be judged against: the metrics, logs, and traces that say whether the new version is actually healthy. Put together, a regression the pipeline introduced is detected, correlated to the deploy that caused it, and reversed before it becomes an outage. That is the difference between a pipeline that ships incidents and a pipeline that catches them.

This is exactly where an AI operations layer earns its place. A change ships through the pipeline, and Nova AI Ops watches the deploy across AWS, GCP, Azure, Linux, and Windows at once. When a regression appears, a latency spike, an error-rate jump, a saturated resource, the platform correlates it back to the specific deploy that introduced it, rather than leaving an on-call engineer to guess at 3am whether the new release is the culprit. Within a policy envelope you define, it then auto-remediates, including triggering the rollback, and escalates to a human only when the situation falls outside that envelope. The pipeline ships the change; the AI layer makes the consequence of a bad change a non-event instead of a page.

For the architecture behind that closed loop, see the guides to AI SRE, Agentic SRE, and self-healing infrastructure, and the foundations in observability and AIOps. The pipeline and the operations layer are two halves of the same delivery loop: one ships change, the other catches what the change breaks.

A 90-day plan and a 10-point checklist

Whether you are building a pipeline from scratch or maturing one that stalled at automated testing, the same sequence applies: fix the foundation, add safe delivery, then close the loop with operations. Here is a practical 90-day plan.

Days 1–30: Fix the pipeline foundation

Get build and test into one automated pipeline that runs on every commit, so main is always proven. Define the pipeline as code in the repository. Make the feedback fast enough, minutes not tens of minutes, that engineers actually wait for it, parallelizing tests and running only what a change affects. Adopt trunk-based development with short-lived branches and branch protection that refuses to merge a red build. By day 30, every change is integrated and tested continuously, and main is reliably green.

Days 31–60: Add progressive delivery and rollback

Add quality and security gates: static analysis, dependency and container scanning, and secret detection. Publish an immutable, versioned artifact so what you test is exactly what you deploy. Automate deploy to staging and then production with a safe strategy, canary or blue/green, and wire in post-deploy health checks with automated rollback. By day 60, a deploy is a routine, reversible event rather than a stressful manual ceremony.

Days 61–90: Close the loop with observability and remediation

Connect the pipeline to your observability stack so every deploy is judged against real metrics, logs, and traces. Add automated canary analysis that promotes or aborts based on those signals. Finally, wire in an operations layer that detects a regression, correlates it to the deploy that caused it, and remediates within a policy envelope, including rollback, so a bad change is caught and reversed without a human at 3am. By day 90, velocity and stability rise together rather than trading off.

The 10-point CI/CD checklist

Use this to grade an existing pipeline or to scope a new one. A pipeline that can answer yes to all ten is in excellent shape.

  1. Does every commit trigger an automated build and test run? If integration only happens on long-lived branches, you have automated testing, not continuous integration.
  2. Is test feedback fast enough that engineers wait for it? Minutes, not tens of minutes. Slow feedback quietly kills the discipline.
  3. Is main kept green by branch protection? A red build should block merges automatically, not depend on someone noticing.
  4. Is the pipeline defined as code in the repository? Reviewable, diffable, and rolled back by reverting a commit, not clicked together in a UI.
  5. Do security scans gate the pipeline? Static analysis, dependency and container scanning, and secret detection running on every change.
  6. Is the deployed artifact immutable and versioned? What you tested is exactly what you ship, with no rebuild between test and deploy.
  7. Do you deploy with a safe strategy? Canary, blue/green, or rolling, never an all-at-once replace with no fallback.
  8. Can you roll back fast and automatically? Health-gated rollback in seconds is the real safety net behind frequent deploys.
  9. Do you separate deploy from release with feature flags? Shipping code dark and releasing behavior independently shrinks the risky moment.
  10. Do you track the four DORA metrics? Deployment frequency, lead time, change failure rate, and recovery time, watched over time.

Frequently asked questions

What is CI/CD?
CI/CD is the pairing of continuous integration and continuous delivery (or deployment), the automated pipeline that takes a code change from a commit all the way to running in production. Continuous integration is the practice of merging every developer's work into a shared main branch many times a day and proving each merge with an automated build and test run. Continuous delivery extends that so the tested artifact is always in a deployable state and can be released with one click. Continuous deployment goes one step further and ships every passing change to production with no manual gate. Together they form the backbone of modern software delivery: the mechanism that turns a commit into a running, verified release.
What is the difference between continuous delivery and continuous deployment?
The only difference is the gate before production. In continuous delivery, every change that passes the pipeline is proven deployable and parked in a ready state, but a human presses the button to release it, so the team decides when each release goes out. In continuous deployment, there is no human gate: every commit that passes the full pipeline ships to production automatically. Continuous delivery suits regulated, high-blast-radius, or change-window-constrained environments where a human sign-off is required. Continuous deployment suits teams with deep automated test coverage, fast rollback, and progressive-delivery safety nets that make a bad change cheap to recover from. Both abbreviate to CD, which is why people conflate them.
What is continuous integration?
Continuous integration is the discipline of merging every engineer's work into a shared main branch frequently, ideally several times a day, and proving each merge with an automated build and a fast test suite. The point is to catch integration problems within minutes of the commit that caused them, while the change is small and the author still has the context, instead of discovering them weeks later in a painful big-bang merge. The two practices that make it real are trunk-based development, where everyone commits to short-lived branches off main, and keeping main green, where a broken build is the team's top priority to fix before any new work lands.
What are the stages of a CI/CD pipeline?
A typical pipeline runs through seven stages. Source: a commit or merge to a watched branch triggers the run. Build: the code is compiled and packaged into an artifact or container image. Test: unit tests run first for fast feedback, then integration and end-to-end suites. Security scanning: static analysis, dependency and container scanning, and secret detection gate the change. Artifact: the verified build is published to a registry with an immutable version. Deploy: the artifact is rolled out to staging and then production using a chosen strategy. Verify: post-deploy health checks and observability confirm the release is healthy, and trigger a rollback if it is not. The whole pipeline is itself defined as code in the repository.
What are the main deployment strategies?
Four are common. Rolling deployment replaces instances in batches so the service stays up but old and new versions run side by side during the rollout. Blue/green keeps two identical environments and flips all traffic from the old (blue) to the new (green) at once, giving an instant rollback by flipping back. Canary releases the new version to a small slice of traffic first, watches the metrics, and promotes it only if it stays healthy. Feature flags decouple deploy from release entirely: the code ships dark and you turn the feature on for users with a config change, which can be rolled back without a redeploy. Most mature teams combine canary or blue/green for infrastructure with feature flags for product changes.
What is the difference between deploy and release?
Deploy means the new code is running on production servers. Release means users can actually see and use the new behavior. Feature flags separate the two: you deploy the code dark, with the new feature wrapped in a flag that is off, so it is running in production but invisible. Later you release it by flipping the flag on, for everyone or a percentage, with no redeploy. This separation is powerful for reliability because it shrinks the risky moment. Deploying carries the infrastructure risk and releasing carries the product risk, and handling them separately means a bad feature can be turned off in seconds without rolling back the whole deploy.
What are the DORA metrics for CI/CD?
DORA defines four keys that measure delivery performance. Deployment frequency: how often you ship to production, where elite teams deploy on demand, multiple times a day. Lead time for changes: how long it takes a commit to reach production, measured in hours for elite teams. Change failure rate: the share of deploys that cause a degradation needing a fix or rollback, ideally under fifteen percent. Failed deployment recovery time, often expressed as MTTR: how fast you restore service when a change breaks it. The first two measure throughput and the last two measure stability, and good pipelines improve all four at once rather than trading speed for safety.
How is CI/CD different from DevOps automation?
A CI/CD pipeline is one part of DevOps automation, not the whole of it. CI/CD is the focused discipline of getting a change from a commit into production: build, test, integrate, and deploy. DevOps automation is the wider surface that also includes provisioning infrastructure as code, holding configuration in a desired state, and operating and remediating the system once it is live. This page is the deep dive on the pipeline itself. For the broader automation surface that the pipeline sits inside, see the DevOps automation guide, and for the culture and practices around it, see the DevOps guide.
Why is CI/CD important for reliability?
Because change is the single largest cause of production incidents, and the pipeline is the thing that ships change. A pipeline that deploys often in small increments makes each change easy to reason about and cheap to roll back, which lowers the blast radius of any one bad deploy. A pipeline with progressive delivery and automated rollback turns a regression from an outage into a non-event, because the bad version never reaches most users. The corollary is that a fast pipeline with no safety net is dangerous: it just ships breakage faster. The reliability win comes from pairing pipeline speed with canary analysis, health-gated deploys, and instant rollback so velocity and stability rise together.
How do I build or improve a CI/CD pipeline?
Start by getting build and test into one automated pipeline that runs on every commit, so main is always proven, then make the feedback fast enough that engineers actually wait for it. Next, add quality and security gates and publish an immutable, versioned artifact so what you test is exactly what you deploy. Then automate deploy to staging and production with a safe strategy such as canary or blue/green, and wire in post-deploy health checks with automated rollback. A practical sequence over ninety days is to fix the pipeline foundation in the first month, add progressive delivery and rollback in the second, and close the loop with observability and automated remediation in the third, so a regression the pipeline introduces is caught and reversed without a human at 3am.

CI/CD is the focused pipeline view; go up and out from here. The broader automation surface this pipeline lives inside is DevOps automation, and the culture around it is DevOps. The sibling discipline for provisioning is infrastructure as code. On reliability foundations: site reliability engineering, AI SRE, Agentic SRE, and AIOps. On the metrics and practices a pipeline moves: MTTR, SLOs and error budgets, incident management, and self-healing infrastructure. On telemetry and operations: observability, monitoring, and chaos engineering. On the day-to-day work: eliminating toil, building runbooks, and running blameless postmortems. For teams shipping AI systems: LLMOps and the AI engineer's guide to production reliability. See it all in one place on the features overview.

See your pipeline's deploys watched and auto-remediated in real time.

Nova AI Ops is the Multi-Agent OS for SRE & DevOps. 100 specialized AI agents across 12 teams watch every deploy across AWS, GCP, Azure, Linux, and Windows, correlate regressions to the change that caused them, and roll back within a policy envelope. Free tier available for small teams.