CI/CD & GitOps Practical By Samson Tanimawo, PhD Published Aug 30, 2025 4 min read

Rollback vs Roll-Forward

Two recovery strategies.

Rollback brings the old version back

Fast. Predictable. The previous version was working, so reverting restores known-good state.

Required when: the bug is severe, the fix is non-trivial, or you don't yet understand the issue.

Cost: forward progress lost. Any data written under the new version may not be readable by the old (if schema changed).

Roll-forward fixes the bug live

Default for: trivial bugs with obvious fixes, schema-breaking changes that can't easily roll back, time-sensitive features.

Risk: pushes a second change through the deploy pipeline during an active incident. Compounds risk.

Cost: under stress, engineers ship buggy fixes. "Roll forward" can become "roll forward into a worse state."

How to decide

Default to rollback. The known-good state is safer than the unknown new state.

Roll forward only when: the rollback would lose user data, the schema cannot be reverted, or the fix is genuinely trivial and reviewed.

Document the decision in the incident timeline. Future postmortems will reference it.

Schema-aware rollback

Forward-compatible migrations: add columns, never remove. The old version still works.

Backward-compatible code: don't drop columns until all readers have moved.

This shape lets you rollback code without rolling back data. The cost upfront is real; the rollback freedom is worth it.

Operational rules

Practice rollback in staging quarterly. The first real rollback should not be the first attempt.

Auto-rollback on SLO regression for low-blast-radius services. Manual decision for high-blast-radius.

Have a rollback playbook in the runbook. The on-call should not be inventing the procedure during a sev1.