Failed Deploy Cleanup
Failed deploys leave artifacts. Clean up.
Failed deploys leave artifacts
Half-applied Terraform changes. Partial Helm upgrades stuck in pending state. Orphaned ALB target groups.
Each artifact is a future incident waiting to happen.
Without cleanup discipline, the environment accumulates broken state.
Cleanup checklist
Terraform: `terraform state list` and reconcile partial state.
Helm: `helm history release` and `helm rollback` to last good revision.
Cloud resources: orphan check on ELBs, target groups, IAM roles, security groups.
Automate detection
Drift detection (Terraform plan, Driftctl) catches lingering changes.
Resource naming convention with deploy ID. Orphans without an active deploy are auto-flagged.
Daily report: failed deploys + cleanup status. Open items get assigned.
Rollback discipline
Define rollback procedure for each deploy type before the deploy. Don't improvise during failure.
Practice rollback in staging quarterly. The first real rollback should not be the first attempt.
Auto-rollback on SLO regression for low-blast-radius services. Manual for high-blast-radius.
How to install the discipline
Add cleanup as a required step in the deploy runbook.
Failed deploy automatically opens a ticket assigned to the deployer. Close on cleanup.
Quarterly audit: residual orphans by team. Visibility drives accountability.