Pod Evictability and Toleration
Some pods shouldn't evict. The patterns.
Tolerations
Pod evictability is the discipline of controlling which pods can be evicted under various conditions. The team's choices about tolerations, priority, and explicit evict-blocking shape the cluster's behavior during disruptions and resource pressure.
What tolerations provide:
- Allow scheduling on tainted nodes.: Tainted nodes reject pods by default. Pods with matching tolerations can schedule on them; the toleration is the explicit grant.
- Critical pods on dedicated nodes.: The pattern: dedicate certain nodes via taints; only critical workloads have the matching tolerations; the critical workloads do not compete with regular workloads for resources.
- Per-purpose nodes.: GPU nodes for ML workloads; high-memory nodes for caches; dedicated database nodes. Each purpose has its taint; only matching pods schedule there.
- Tolerations are explicit.: The team decides which pods toleration which taints. The explicit specification supports auditing; unauthorized scheduling is impossible.
- NoSchedule, NoExecute, PreferNoSchedule.: Different effects produce different behaviors. NoSchedule prevents new pods; NoExecute also evicts existing; PreferNoSchedule is a soft constraint. The right effect matches the use case.
Tolerations are the discipline for placement. The team's policy controls which workloads can run where.
Priority
Priority classes determine which pods get protected during resource pressure. Higher priority pods are protected; lower priority pods are evicted when capacity is tight.
- High priority pods evict less.: When the node is under pressure, the kubelet evicts lower-priority pods first. High-priority workloads stay running; the cluster's most important workloads are protected.
- Combine with tolerations.: Tolerations control where pods schedule; priority controls which pods stay. Together they produce comprehensive workload protection.
- Document priority hierarchy.: The team documents which workloads have which priority class. The hierarchy reflects business importance; the documentation supports operational decisions.
- Avoid priority inflation.: When everyone wants high priority, the priority loses meaning. The team's policy bounds priority class assignments; only genuinely critical workloads get high priority.
- System priority for system pods.: Kubernetes ships with system priority classes (system-cluster-critical, system-node-critical). These are reserved for system pods; the team does not use them for application pods.
Priority is the protection discipline. The team's choices about priority shape what survives resource pressure.
Not evictable
Some pods should not be evicted under any circumstances. The safe-to-evict: false annotation marks them; the autoscaler and drain operations respect the annotation. The discipline is using this sparingly.
- safe-to-evict: false annotation.: The cluster autoscaler and similar tools check this annotation. Pods marked safe-to-evict: false are not evicted by the autoscaler; the workload is protected.
- Use sparingly.: The annotation is powerful and operationally heavy. Too many pods marked unevictable produce nodes that cannot be drained; cluster operations become harder.
- Blocks node drain.: When a node has unevictable pods, drain operations cannot proceed cleanly. The team must either move the pods manually or skip the drain; cluster maintenance is harder.
- Document the reason.: Each unevictable pod has a documented reason. Specific stateful workloads, specific operational requirements; the documentation justifies the annotation.
- Periodic review.: The team reviews unevictable pods periodically. Are they still needed? Can the workload be made evictable through better design? The review prevents accumulation.
Pod evictability is one of those Kubernetes operational disciplines that pays off across many disruptions. Nova AI Ops integrates with cluster telemetry, surfaces evictability patterns, and produces the per-workload visibility that the platform team uses to manage cluster operations effectively.