Kubernetes Resource Limits and Requests: The Math Behind QoS Classes
Resource limits look like a tuning knob. They are actually a scheduling-class declaration that decides whether your pod is the first or last to be evicted.
Requests vs limits, restated
Requests are what the scheduler reserves: enough capacity to fit the pod. Limits are what the kubelet enforces at runtime: more than this and you get throttled (CPU) or killed (memory).
What most teams do not realize: setting these values determines a third thing, the pod's QoS class. The class is what the kubelet uses to decide who to evict when a node runs hot. Same workload, different limits, different eviction priority.
The three QoS classes and what they do
- Guaranteed. request == limit for both CPU and memory. Highest priority; last to be evicted under pressure.
- Burstable. request < limit (or only one set). Mid-tier eviction; can use more than reserved if available.
- BestEffort. No requests or limits set. First to be evicted; usually accidental.
When Burstable makes sense
Burstable is the right pick when peak usage is unpredictable but baseline is steady, most web services. Set request to 95th-percentile baseline; set limit to 200% of that. The pod gets reserved capacity for the common case and headroom for spikes.
Guaranteed is right for things you cannot afford to evict, a primary database, a stateful queue worker. Pay the cost of fully-reserved capacity to buy eviction immunity.
Catching OOMKills before they happen
- Memory limits trigger OOMKill silently. The container restarts; the user sees a 502; the on-call sees nothing useful unless they know to look at
kubectl describe podfor OOMKilled status. - Add an alert on
kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}. The first OOMKill is the warning; ten in a day means the limit is too tight for the workload.
Antipatterns
- No limits at all. One bad pod takes down the node.
- Limits set to match observed peak. Peak grows; you alert on yesterday's number; OOMKill returns.
- CPU limits on latency-sensitive services. Throttling under load makes things worse. Use requests; skip CPU limits for these workloads.
What to do this week
Three moves. (1) Audit your top-10 most-restarted pods; check for OOMKilled status. (2) Move latency-sensitive services from CPU-limited to CPU-unlimited (with requests still set). (3) Add an OOMKilled alert to your platform dashboard.