Kubernetes Intermediate By Samson Tanimawo, PhD Published Dec 13, 2026 10 min read

Kubernetes Resource Limits and Requests: The Math Behind QoS Classes

Resource limits look like a tuning knob. They are actually a scheduling-class declaration that decides whether your pod is the first or last to be evicted.

Requests vs limits, restated

Requests are what the scheduler reserves: enough capacity to fit the pod. Limits are what the kubelet enforces at runtime: more than this and you get throttled (CPU) or killed (memory).

What most teams do not realize: setting these values determines a third thing, the pod's QoS class. The class is what the kubelet uses to decide who to evict when a node runs hot. Same workload, different limits, different eviction priority.

The three QoS classes and what they do

Guaranteed. request == limit for both CPU and memory. Highest priority; last to be evicted under pressure.
Burstable. request < limit (or only one set). Mid-tier eviction; can use more than reserved if available.
BestEffort. No requests or limits set. First to be evicted; usually accidental.

When Burstable makes sense

Burstable is the right pick when peak usage is unpredictable but baseline is steady, most web services. Set request to 95th-percentile baseline; set limit to 200% of that. The pod gets reserved capacity for the common case and headroom for spikes.

Guaranteed is right for things you cannot afford to evict, a primary database, a stateful queue worker. Pay the cost of fully-reserved capacity to buy eviction immunity.

Catching OOMKills before they happen

Memory limits trigger OOMKill silently. The container restarts; the user sees a 502; the on-call sees nothing useful unless they know to look at kubectl describe pod for OOMKilled status.
Add an alert on kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}. The first OOMKill is the warning; ten in a day means the limit is too tight for the workload.

Antipatterns

No limits at all. One bad pod takes down the node.
Limits set to match observed peak. Peak grows; you alert on yesterday's number; OOMKill returns.
CPU limits on latency-sensitive services. Throttling under load makes things worse. Use requests; skip CPU limits for these workloads.

What to do this week

Three moves. (1) Audit your top-10 most-restarted pods; check for OOMKilled status. (2) Move latency-sensitive services from CPU-limited to CPU-unlimited (with requests still set). (3) Add an OOMKilled alert to your platform dashboard.