Resource Overcommit Strategy
Requests vs limits gap = overcommit. The strategy.
Ratio
Resource overcommit is the pattern of allowing pod limits to sum to more than the node's actual capacity. The math works because not all pods burst simultaneously; the cluster gets more density than strict provisioning would allow. The trade-off is risk if the simultaneous burst happens.
What the ratio looks like:
- Requests equals guaranteed.: The pod's resource request is what Kubernetes reserves for it. The pod is guaranteed to have at least its request available; scheduling honors this guarantee.
- Limits equals burst.: The pod's limit is the maximum it can use. The pod can burst above its request up to its limit; the kernel enforces the limit.
- 2 to 3x overcommit typical.: The sum of pod limits divided by the sum of pod requests is the overcommit ratio. 2-3x is typical for production clusters; the workload usually does not all burst at once.
- The math relies on statistics.: The probability of all pods bursting simultaneously is low. The expected utilization is much less than the limits' sum; the overcommit produces density without typical contention.
- Different workloads support different ratios.: Steady-state workloads tolerate higher overcommit; bursty workloads need lower. The team's analysis matches the ratio to the workload pattern.
The ratio is the lever. Higher ratios produce more density; lower ratios produce more stability.
Risk
The risk is the simultaneous burst. When multiple pods burst at the same time, the node may not have capacity; some pods are throttled or killed.
- Multiple bursts at once: OOM.: When memory bursts collide, the node runs out of memory. The OOM killer terminates pods; the workload is disrupted; the customer impact is real.
- Saturation alerts catch this.: Saturation alerts (queue depth, memory pressure, throttling rate) catch the situation. The team is notified before the OOM happens or just as it starts; the response can be timely.
- Statistical risk.: The risk depends on workload patterns. Workloads with correlated traffic (everything bursts at the same time) have higher risk; workloads with independent traffic have lower.
- CPU vs memory.: CPU overcommit is safer than memory. CPU contention produces throttling (slow); memory contention produces OOM (dead). Memory overcommit deserves more caution.
- Production tolerance.: Production should be more conservative. The OOM cost is real; the savings from aggressive overcommit may not justify the risk. Production typically uses lower ratios than dev.
The risk is the cost of overcommit. Acceptable risk depends on workload tolerance for occasional disruption.
Monitor
Monitoring is what catches the contention. Without monitoring, the overcommit's risk is invisible until it manifests as customer impact; with monitoring, the team responds proactively.
- Per-node memory pressure.: Each node's memory pressure is monitored. When pressure rises, the team is alerted; the contention is caught early.
- Eviction events.: Pod evictions for memory pressure indicate the node is contending. The events are captured; the patterns drive the team's response.
- Surfaces over-commit issues.: The monitoring surfaces situations where overcommit is producing problems. The data drives decisions: tighten the ratio, reschedule pods, add capacity.
- Trend over time.: The contention frequency trends over time. Increasing contention indicates the overcommit is too aggressive; the team adjusts.
- Per-workload analysis.: Some workloads burst together (correlated); some do not. The monitoring reveals which workloads contribute most to contention; the analysis drives placement decisions.
Resource overcommit is one of those Kubernetes operational disciplines that produces real density gains when managed well. Nova AI Ops integrates with cluster telemetry, surfaces overcommit patterns and contention, and produces the operational visibility that the platform team uses to tune the ratio.