Cloud Custodian for Cleanup
Cloud Custodian automates cleanup of unused resources.
Setup
Cloud Custodian is the open-source policy engine for cloud cleanup. Teams use it to find and remediate unused resources, untagged resources, policy violations. The discipline is automating cleanup that would otherwise require manual effort.
What setup looks like:
- YAML-based policies.: Cloud Custodian policies are written in YAML. The format declares what resources to find, what filters to apply, what actions to take; the format is concise.
- Run on schedule.: Policies execute on a schedule. Daily, weekly, monthly; the cadence matches the policy's purpose; the discipline is automated.
- Policy: find unused EBS volumes; tag for cleanup.: A typical policy finds resources that are unused and tags them for cleanup. The first run identifies; subsequent runs delete (after a grace period); the cleanup is staged.
- Multi-cloud support.: Cloud Custodian supports AWS, Azure, GCP, and others. Multi-cloud teams use one tool; the policies are similar across clouds; the operational story is consolidated.
- Lambda-based execution.: Some policies run as Lambda functions (or equivalents). Event-driven cleanup; the discipline triggers on resource events; the response is fast.
Setup is bounded. The discipline is consistent: policies in version control; CI/CD applies them; the discipline scales.
Policies
Common policies cover recurring cleanup needs. Untagged resources, idle infrastructure, old snapshots all are policy candidates; the team's library grows over time.
- Common: untagged resources.: Resources without required tags violate the team's tagging discipline. Cloud Custodian finds them; tags them for review; the discipline is enforced.
- Idle NAT gateways.: NAT gateways that have not seen traffic in a defined window are candidates for cleanup. The cost savings are direct; the discipline catches forgotten infrastructure.
- Old snapshots.: EBS snapshots, RDS snapshots, similar artifacts accumulate. Old snapshots are deleted per retention policy; the storage cost is bounded.
- Library of community-shared policies.: The Cloud Custodian community shares policies. The team starts with the library; adapts to their needs; the bootstrap is fast.
- Custom policies.: Team-specific patterns become custom policies. The library captures the team's discipline; the policies are versioned; the discipline is preserved.
The policies are the discipline encoded. Each one captures a recurring cleanup task.
Audit
Cloud Custodian's audit mode is the safety. Dry-run before applying; verify before deleting; production resources warrant review even when policies match.
- Dry-run mode.: Cloud Custodian's dry-run shows what would happen without doing it. The team verifies the policy's effect; the actions are not applied; the verification is bounded effort.
- Verify before applying.: Before enabling a policy, the team dry-runs it. The findings are reviewed; the actions are confirmed; the policy is then applied.
- Critical: don't auto-delete production resources without review.: Some policies are auto-delete; some are tag-for-review. Production resources warrant the review; the discipline is matching the policy's aggressiveness to the resource's importance.
- Notify owners.: Some policies notify resource owners before action. The owners can object; legitimate use cases are protected; the discipline is collaborative.
- Audit policy effects.: The team audits Cloud Custodian's actions. What was deleted; what was tagged; what was missed. The audit produces understanding; the policy is refined over time.
Cloud Custodian for cleanup is one of those FinOps disciplines that pays off in continuous cost optimization. Nova AI Ops integrates with cloud cost data and policy engines, surfaces patterns, and supports the team's cleanup discipline.