Progressive vs Rolling: Decision Math
Cost vs safety in deployments.
Progressive and rolling are different
Rolling: replace pods one batch at a time. No traffic distinction; new pods get full traffic immediately.
Progressive (canary): new version gets a fraction of traffic. Old version handles the rest. Promote based on metrics.
Both can co-exist; progressive is rolling with traffic gating.
When rolling is enough
Stateless services with strong test coverage. The new pod will work or won't; partial-traffic doesn't add safety.
Internal tools, low-impact services. The cost of a few minutes of bad pods is low.
When canary infrastructure isn't there. Rolling deploys work with vanilla Kubernetes; progressive needs Argo Rollouts, Flagger, or service mesh.
When progressive wins
Customer-facing services with SLOs. Progressive allows automated rollback before customers feel impact.
High-blast-radius changes. Database migrations, auth changes, payment paths.
Services with rare bugs. Progressive surfaces them on a small fraction; rolling exposes everyone.
Infrastructure cost
Progressive needs: a service mesh (Istio, Linkerd) or load balancer with traffic shifting (Argo Rollouts + Ingress).
Per-cohort metrics: error rate, latency for canary vs baseline. Without per-cohort metrics, progressive doesn't add safety.
Operational learning curve. Engineers must understand the progressive deploy abstraction.
Decision rule
Customer-facing tier-1 services: progressive deploys.
Backend services with tests: rolling.
Database migrations: progressive (per-region, per-shard).
Don't put progressive on every service; the operational cost outweighs the benefit for low-risk services.