KubernetesBy Nova AI Ops TeamPublished Sep 23, 202613 min read

Kubernetes Deployment Strategies: Rolling, Blue-Green, Canary Compared

Kubernetes ships with two built-in deployment strategies. Most production teams need three more. Picking the wrong strategy turns a routine release into a customer-impacting incident.

The Five Strategies (and Two More You Don't Need)

Five deployment strategies cover 99% of real Kubernetes use cases: Rolling, Recreate, Blue-Green, Canary, and A/B Test. Kubernetes ships native support for the first two via the Deployment API. The other three require either custom tooling or a progressive delivery controller like Argo Rollouts or Flagger.

Two more strategies appear in deployment literature, Shadow and Dark Launch, but both are really patterns layered on top of the five above. Treat them as building blocks rather than first-class strategies.

Strategy 1: Rolling Update (the default)

How it works: Kubernetes gradually replaces old pods with new pods, one (or a few) at a time. Configurable via maxSurge (how many extra pods can run during the rollout) and maxUnavailable (how many pods can be down).

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0  # never less than full replicas

When to use: The right default for stateless services where some old + new mix during the rollout is acceptable.

Rollback: kubectl rollout undo deployment <name>. Reverts to the previous ReplicaSet. Takes 1-2 minutes for a typical deployment.

Limitations: Cannot test the new version with real traffic before fully committing. The mix of old and new versions during the rollout creates short-lived consistency edge cases (a request handled by an old pod, a follow-up handled by a new pod).

Strategy 2: Recreate

How it works: Kubernetes terminates all old pods, then starts new pods. Causes downtime equal to the pod startup time.

spec:
  strategy:
    type: Recreate

When to use: Workloads that cannot tolerate two versions running simultaneously (e.g., schema migration scripts, leader-election jobs that assume single-version, stateful systems with version-dependent on-disk format).

Rollback: Same as rolling, kubectl rollout undo. Costs another round of downtime.

Limitations: Causes downtime by design. Inappropriate for any user-facing workload.

Strategy 3: Blue-Green

How it works: Deploy the new version (green) alongside the existing version (blue) at full replica count. Cut traffic over instantaneously by changing the Service selector. Keep blue running for a rollback window.

Native Kubernetes does not support blue-green directly. The simplest pattern uses two Deployments and updates the Service's selector to switch:

apiVersion: v1
kind: Service
metadata:
  name: payment-svc
spec:
  selector:
    app: payment-svc
    version: blue  # change to green to cut over

When to use: Workloads where you need an instantaneous cutover with the option of an instantaneous rollback. Common for stateful services with version-incompatible data formats.

Rollback: Change the Service selector back. Sub-second.

Limitations: Requires double the resources during the cutover. Stateful services need careful data-migration planning. Cannot test with a small percentage of real traffic before full cutover (that is canary).

Strategy 4: Canary

How it works: Deploy the new version alongside the old, route a small percentage (5%) of real traffic to it, validate metrics, gradually increase the percentage until 100%.

Native Kubernetes can do crude canary by running two Deployments and weighting the replica counts (5 old pods + 1 new pod = ~17% canary). Real canary requires Argo Rollouts or Flagger for traffic splitting and automated promotion/rollback based on metrics.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
      - setWeight: 5
      - pause: {duration: 5m}
      - setWeight: 25
      - pause: {duration: 10m}
      - setWeight: 50
      - pause: {duration: 30m}
      - setWeight: 100
      analysis:
        templates:
        - templateName: success-rate-check

When to use: Any user-facing service where you want to catch regressions with real traffic before they affect everyone. The right default for high-stakes services.

Rollback: Automatic when the analysis template fails (e.g., error rate above 1%). Manual via kubectl argo rollouts abort. Both happen within seconds.

Limitations: Requires investment in metrics-based analysis (the canary needs to know what "good" looks like). Bad analysis templates either approve regressions or block legitimate releases.

Strategy 5: A/B Test

How it works: Like canary, but the traffic split is based on user attributes (cookie, geography, header) rather than a percentage. Used for product experiments, not infrastructure rollouts.

Requires a service mesh (Istio, Linkerd) or a feature-flag service (LaunchDarkly, Statsig) to implement the routing logic.

When to use: When the change is a product hypothesis (does the new checkout flow convert better?) rather than an infrastructure rollout.

Rollback: Disable the feature flag or change the traffic split. Sub-second.

Limitations: Adds significant operational complexity. Use only when product experimentation is a real organizational priority.

Decision Framework

Three questions short-circuit most decisions:

1. Is this a stateless service? If yes, default to canary if you have Argo Rollouts/Flagger, rolling otherwise.

2. Is the new version data-incompatible with the old? If yes, blue-green or recreate. Pick blue-green if downtime is unacceptable, recreate if a brief downtime window is fine.

3. Is this a product experiment, not just an infrastructure rollout? If yes, A/B test with a feature flag service.

Tooling Recommendations

Argo Rollouts: The most popular Kubernetes-native progressive delivery controller. Supports canary, blue-green, and analysis-based promotion. Pairs well with ArgoCD for the GitOps loop.

Flagger: The Flux ecosystem's progressive delivery controller. Supports the same strategies as Argo Rollouts. Better integration with Flux GitOps workflows.

Service mesh (Istio, Linkerd, Cilium): Provides traffic-splitting primitives that both Argo Rollouts and Flagger can use. Required for A/B testing based on request attributes.

For teams that want deployment safety automated end-to-end, including the analysis template authoring, canary metrics evaluation, and automatic rollback decisions, AI-native platforms like Nova AI Ops integrate with Argo Rollouts and Flagger to provide AI-driven canary analysis that catches regressions traditional analysis templates miss. Try Nova.