Velero for K8s Backup
Velero backs up cluster state. The pattern.
Scope
Velero is the open-source backup and restore tool for Kubernetes. The discipline is configuring it to capture what matters and tested its restore capability. Without testing, the backup is unproven; with testing, it is operationally trustworthy.
What scope Velero covers:
- Resources: Deployments, Services, etc.: Velero captures the cluster's Kubernetes resources. Deployments, services, configmaps, secrets, custom resources all are part of the backup.
- PV snapshots optional.: Persistent volume snapshots are configurable. Some teams snapshot all PVs; some snapshot none; most snapshot the workloads where the data matters.
- Configurable.: The backup scope is configured per backup. Different backups can have different scopes; the team's setup matches their needs.
- Namespace-scoped backups.: Velero supports per-namespace backups. Different namespaces can have different backup schedules and retention; the granularity supports per-team or per-application requirements.
- Selector-based.: Backups can be scoped by label selectors. Production workloads back up; sandbox does not; the discipline matches the data's value.
The scope is the foundation. Without deliberate scope, the backup either captures too much (cost) or too little (gaps).
Restore
The restore is what makes the backup valuable. Without testing the restore, the backup is theatrical; with testing, the team's recovery capability is real.
- Disaster recovery: restore to new cluster.: The primary use is disaster recovery. A new cluster is created; Velero restores the backup; the workloads come up; the system is recovered.
- Tested quarterly.: The restore is tested every quarter. The test is structured: create a new cluster, restore the backup, verify workloads, document time and issues. The discipline verifies the capability.
- Selective restore.: Velero supports restoring specific resources or namespaces. The team does not have to restore everything; targeted restores fit specific recovery scenarios.
- Cross-cluster restore.: Velero can restore to a different cluster than the source. Migration between clusters, recovery to a backup cluster, both use the same capability.
- Document the procedure.: The restore procedure is documented. New team members can perform it; the institutional knowledge is preserved; the next test starts from a documented baseline.
The restore is what produces the value. Backups without restores are wasted storage.
Schedule
The backup schedule matches recovery requirements. Daily is standard; some teams need more frequent; some less. The retention policy keeps the backups available for the recovery windows the team needs.
- Daily backups.: Daily is the typical schedule. The recovery window is "last 24 hours"; data losses larger than 24 hours are addressable through backups.
- 30-day retention.: The default retention is 30 days. Most recovery scenarios are within 30 days; the retention policy keeps the backup available.
- Adequate for most needs.: The daily-with-30-day pattern fits most teams. Specific use cases (compliance retention, DR drills) might warrant adjustment.
- Tiered retention possible.: Daily backups for 30 days; weekly for 90; monthly for a year. The tiered retention provides flexibility without unbounded storage cost.
- Track backup health.: Each backup's success is tracked. Failed backups are alerted; the team's backup capability is verified continuously rather than discovered during recovery.
Velero backup is one of those Kubernetes operational disciplines that pays off in disaster recovery scenarios. Nova AI Ops integrates with Velero and similar backup tools, surfaces backup health and restore-test results, and produces the per-cluster recovery readiness view that the platform team uses.