PodDisruptionBudget Testing
PDBs are configured but rarely tested. The test.
Test
Pod Disruption Budget (PDB) test is the discipline of verifying that PDBs actually do what they are supposed to do. Without testing, PDBs that look correct on paper may not protect workloads in practice. With testing, the protection is verified before it matters.
What testing looks like:
- Manually drain a node.: The team picks a node and runs kubectl drain. The drain attempts to evict pods; the PDB controls how many can be evicted simultaneously; the actual behavior is observed.
- Verify PDB enforces.: Pods protected by a PDB should not all be evicted at once. If the drain proceeds against PDB intent, the PDB is misconfigured; investigation reveals the cause.
- Catches misconfigured PDBs.: Common misconfigurations: PDBs targeting wrong selectors, minAvailable set too low, application replicas count not matching PDB assumptions. The test catches each.
- Use a non-production node.: The test happens on a non-production node first. Issues surface in non-production; the production rollout follows after validation.
- Coordinated with on-call.: The test happens in coordination with on-call. The drain produces real workload movement; the team is aware and ready to respond if needed.
Testing is what verifies the PDB. Without the test, the PDB's effectiveness is assumed; with the test, it is demonstrated.
Simulate
The simulation happens before the real test. kubectl drain --dry-run shows what would happen without actually doing it; the team can verify the expectations before committing.
- Use kubectl drain --dry-run.: The dry-run mode shows the planned drain operation without executing. The output shows which pods would be evicted and which would be blocked.
- Surfaces what would happen.: The team sees the expected behavior. If the drain plans to evict more pods than the PDB should allow, the PDB has issues; the team investigates before the real drain.
- Iteration before commitment.: The team can iterate on PDB configuration with dry-run feedback. Changes to the PDB; new dry-run; verify; repeat until the behavior is correct.
- Lower-risk than real drain.: The dry-run produces no actual disruption. The team's exploration is bounded; mistakes during exploration do not affect production.
- Combine with explicit testing.: Dry-run plus real test produces complete validation. Dry-run for ongoing verification; real test for periodic deep validation; the two together cover the discipline.
The simulation is the safe layer. It exposes PDB issues without producing actual disruption.
Review
PDBs need periodic review. Configurations drift; new workloads get launched; the PDB inventory grows. The review keeps the protection effective.
- Annual: test your PDBs.: Once per year, the team tests every important PDB. The discipline keeps the testing fresh; the PDBs that have not been tested in a year are candidates for the next test.
- Without testing, they're theatre.: An untested PDB has unknown reliability. The first time it is exercised in a real disruption (cluster upgrade, node failure, voluntary maintenance), it might fail; the cost is real disruption to workloads.
- Inventory the PDBs.: The team maintains an inventory of PDBs. New PDBs are added; old ones are reviewed; abandoned ones are cleaned up.
- Correlate with workloads.: Each PDB protects specific workloads. The correlation is documented; the protection is intentional; gaps (workloads that need PDBs but do not have them) are identified.
- Update as workloads change.: When workloads change replica counts or topology, the PDB might need adjustment. The review catches these; the PDB stays aligned with the workload's reality.
Pod Disruption Budget test is one of those Kubernetes operational disciplines that pays off when cluster maintenance happens. Nova AI Ops integrates with cluster telemetry, surfaces PDB coverage and test history, and produces the per-workload disruption-readiness view that the platform team uses.