Deploy Anti-Patterns 2026
Common mistakes in CD.
Manual deploys
The most common deploy anti-pattern in 2026 is the one that looks responsible: a senior engineer carefully running a deploy script, watching it complete, and confirming production health by hand. It feels like rigor; it is actually fragility dressed up as caution. Manual deploys are the source of more incidents than the actual code changes they are deploying.
Why manual deploys produce worse outcomes:
- Ad-hoc and non-reproducible.: Each manual deploy is slightly different. The engineer ran the migration before the deploy this time but after last time. Environment variables were exported in a slightly different order. The result is a build that succeeded today and fails next month, with no way to reproduce what was different.
- Tribal knowledge required.: "Deploys are easy, just ask Sarah" is the team's worst single point of failure. The deploy procedure lives in Sarah's head and in the muscle memory of two other engineers who have done it enough times. New hires cannot deploy. Sarah cannot take vacation during a release window.
- Slow feedback when something is wrong.: Manual deploys do not fire automated burn-rate alerts during canary because there is no canary. The engineer eyeballs the prod metrics for a few minutes after the deploy. Subtle regressions that an automated 30-minute soak would catch slip through because the human attention span is not 30 minutes.
- Audit trail is whatever the engineer remembered to write down.: Compliance, debugging, and root-cause work all suffer because the deploy is not recorded as structured events. "Deployed at around 2 PM by Mark" is not an audit trail; it is a hint.
- Avoid by automating, not by adding more checklists.: Adding more steps to the manual procedure does not fix the problem; it adds more places to make a mistake. The fix is to make the deploy a single button, gated by automated tests, soak windows, and burn-rate analysis. Engineers approve; they do not execute.
The teams that retire manual deploys are the same teams whose incident rate drops within a quarter. The investment is real, the payback is fast, and the operational improvement is durable.
Untested rollback
The second most common anti-pattern is having a documented rollback procedure that nobody has ever actually run. The procedure exists in a runbook. It looks reasonable. It has steps. It would, in theory, restore service if a deploy went bad. In practice, the first time it runs is during an incident, and the runbook turns out to have a typo, a stale step, or a missing prerequisite.
- Trust in an untested rollback is theatre.: A rollback procedure that has not been exercised is a hopeful document. The team feels safer because the runbook exists. The actual probability that the rollback works is unknown, and "unknown" during an incident is the same as "low."
- Test quarterly, in production.: Once a quarter, deliberately deploy a no-op change to production, then run the rollback. Verify the rollback completes, the previous version is serving traffic, and the deploy pipeline can promote forward again. This sounds risky; the alternative is finding out during a real incident.
- Game day exercises.: Schedule a half-day per quarter where the on-call walks through three or four incident scenarios, including rollback. Runbooks get updated where they were wrong. Tooling gets fixed where it was broken. The team builds muscle memory.
- Track rollback time as a metric.: Mean time to rollback is a first-class metric. If it is over 15 minutes, you are running incidents on hope. The target is under 5 minutes for any change deployed in the past 24 hours.
- Forward-compatible migrations are a prerequisite.: If your migration drops a column the previous version reads, your rollback breaks worse than your forward fix. Schema changes that allow rollback are the underlying discipline that makes rollback testing meaningful.
An untested rollback is a hopeful artifact. A tested rollback is a real safety net. The difference is one game day a quarter.
Permanent freeze
The third anti-pattern is the indefinite deploy freeze. It usually starts with good intent: there has been an incident, leadership wants stability, the team agrees to slow down. Then the freeze never lifts. Three months later, the team is sitting on hundreds of staged changes that all need to land at once whenever the freeze finally ends.
- Long freezes accumulate change debt.: Every day the freeze is in place, more changes pile up in branches and review queues. The longer the freeze, the larger the eventual unfreeze deploy. A 3-month freeze followed by a release is statistically guaranteed to cause an incident larger than the one the freeze was meant to prevent.
- Loss of deploy muscle.: Teams that do not deploy regularly forget how. Tooling rots, runbooks go stale, on-call rotations adapt to a no-deploy world. When the freeze finally lifts, the team's ability to deploy safely has degraded along with the freeze duration.
- Time-bound, always.: Every freeze has a defined start, a defined end, and a defined trigger for early lift. "Two-week freeze, lifts when the contributor incidents are postmortemed and the burn rate has been clean for 7 days" is a real freeze. "Freeze until further notice" is an indefinite one and will become a permanent one.
- Targeted, not blanket.: Freezes should target the specific area that was broken, not the entire engineering org. A regression in payments does not justify freezing search. Blanket freezes are usually the result of leadership wanting to be seen doing something, not the result of a specific risk analysis.
- Address the cause, not the symptom.: A freeze is a temporary measure to buy time for a structural fix. If the root cause was missing test coverage, the freeze ends when the coverage lands. If it was a fragile dependency, when the dependency is hardened. The freeze without a corresponding fix is decoration.
Manual deploys, untested rollbacks, and permanent freezes are the three deploy anti-patterns that quietly cost the most. Nova AI Ops watches deploy frequency, rollback test cadence, and freeze duration as engineering health metrics, and surfaces the patterns that are eroding the team's ability to ship safely before they show up as incidents.