Pod Topology Spread Constraints
Beyond affinity: topology spread. The pattern.
Idea
Pod topology spread is the discipline of distributing pods across topology domains (zones, nodes, regions). The mechanism is more flexible than pod anti-affinity; the constraints are gradual rather than binary; the workload's distribution matches operational needs.
What the idea looks like:
- topologySpreadConstraints distributes pods across topology.: The pod spec includes topology spread constraints. The constraints specify how pods should distribute across topology domains; the scheduler respects them.
- Anti-affinity is binary.: Pod anti-affinity says "do not place these pods together" or "place them together"; the rule is binary. Topology spread is more nuanced.
- Topology is gradual.: topologySpreadConstraints' maxSkew defines acceptable imbalance. The scheduler tries to keep the difference between most-loaded and least-loaded domains within maxSkew; the distribution is gradual.
- Per-topology-key.: Different topology keys (zone, hostname, region) can have different constraints. Spread evenly across zones; not too many on one node; the constraints stack.
- Soft and hard constraints.: whenUnsatisfiable can be DoNotSchedule (hard) or ScheduleAnyway (soft). Hard constraints can produce unschedulable pods; soft constraints are best-effort.
The idea is more flexible than anti-affinity. The gradual constraints fit more workloads.
Use
The typical use is zone-aware high availability. Spread pods across zones evenly; fewer pods per zone means less impact from a zone failure.
- Spread across zones evenly.: A workload with 9 replicas across 3 zones should have 3 replicas per zone. Topology spread enforces this; the distribution is intentional.
- Fewer pods per zone equals less risk.: Zone failures take out the pods in the failed zone. Even distribution means a zone failure removes a bounded fraction of replicas; the workload stays available.
- Zone-aware HA.: The discipline produces zone-aware high availability. The workload is intentionally distributed; the operational characteristics are predictable.
- Combine with PDB.: Topology spread plus PDB produces comprehensive availability. Spread distributes pods; PDB bounds simultaneous disruption; the workload survives multiple failure modes.
- Calibrate maxSkew.: The maxSkew determines acceptable imbalance. Too tight produces scheduling failures; too loose loses the distribution benefit. Calibration matters.
Zone-aware HA is the typical use. The discipline pays off when zones genuinely fail.
Avoid
Some configurations produce predictable problems. Hard constraints with limited topology can prevent scheduling; the discipline includes recognizing and avoiding these patterns.
- Hard maxSkew with limited topology.: If maxSkew is 1 and there are only 2 zones, scheduling 3 pods becomes hard. The strict constraint cannot be satisfied; pods stay pending.
- Pods may not schedule.: The scheduling failure is a real operational problem. The team's pods stay pending; the workload's capacity is reduced; the alerts fire.
- Use ScheduleAnyway for resilience.: ScheduleAnyway treats the constraint as a preference. The scheduler respects it when possible but allows scheduling when the constraint cannot be satisfied; the workload always schedules.
- Test the configuration.: Before relying on topology spread, the team tests. Drain a zone; verify the workload behaves as expected; the configuration matches the intent.
- Document the choice.: The team's topology spread configuration is documented. Why these constraints; what the math is; how the workload responds. Future maintainers understand.
Pod topology spread is one of those Kubernetes scheduling disciplines that improves availability when configured correctly. Nova AI Ops integrates with cluster scheduling and pod distribution telemetry, surfaces patterns, and supports the team's scheduling decisions.