Kubernetes Resource Limits Done Right
Setting resource requests and limits in Kubernetes seems trivial. Setting them well takes ten incidents and a few thousand dollars in over-provisioned nodes to learn.
Why requests and limits
Requests are a hint to the scheduler ("guarantee me this much"). Limits are a hard cap ("never let me have more than this"). Set neither and you get the BestEffort QoS class, first to be killed under pressure. Set them well and you control how the cluster behaves at the edge.
The scheduling impact. Requests determine where pods get scheduled. The Kubernetes scheduler places pods on nodes that have enough remaining capacity. Without requests, the scheduler can't make informed decisions; pods land on saturated nodes; performance is unpredictable.
The eviction impact. Limits determine how the kubelet evicts under pressure. Pods exceeding limits get throttled (CPU) or killed (memory). Without limits, a single misbehaving pod can consume all node resources, starving its neighbours.
The three QoS classes
- Guaranteed: requests = limits. Highest priority; least likely to be evicted.
- Burstable: requests < limits. Most workloads. Get reserved minimum, can burst higher when nodes have headroom.
- BestEffort: no requests or limits. First to be killed when nodes are pressured.
The Guaranteed class is the strictest. The pod's CPU and memory are pinned at the requested values. Other pods can't take this capacity even when the Guaranteed pod isn't using it. Trade-off: predictable performance, no bursting.
The Burstable class is the workhorse. Most production pods are Burstable: they have a reserved minimum (request) and can burst up to a ceiling (limit) when nodes have spare capacity. Most efficient resource usage; some performance variability.
The BestEffort class is the cleanup. No requests or limits; pods scheduled wherever space allows; first to be killed under pressure. Suitable for: batch jobs, opportunistic workloads, anything where eviction is acceptable.
Right-sizing
Measure p95 of CPU and memory under real load over a week. Set requests = p50, limits = p99 + 30%. Iterate after one production cycle. Most teams over-provision by 2-4x because they sized once and never re-measured.
The over-provisioning pattern. Engineer sizes pods at the high end during initial deployment ("just to be safe"). Production usage settles at 30-40% of the size. The pod is paying for capacity it never uses; the cluster is denser than it needs to be.
The right-sizing exercise. Pull pod metrics for the last week. Compute p50 and p99 of CPU and memory per pod. Set request to p50 (the typical usage), limit to p99 + 30% (covers spikes plus margin). Apply; observe; iterate next month.
The savings at scale. A team running 200 pods, over-provisioned by 3x, is paying for ~600 pods worth of capacity. Right-sizing to actual usage frees 400 pods worth of cluster capacity, could be re-used for other workloads or cluster shrunk. Annualised, $50k-$200k in cluster cost savings.
The anti-pattern
Setting limits = requests for everything (Guaranteed QoS for all) sounds safe and produces over-allocated clusters. Most workloads can burst safely; pin only the latency-critical ones to Guaranteed.
The pathology. Engineer sets requests = limits because "Guaranteed is best." Pod with 1 CPU request and 1 CPU limit can never burst. When it needs 1.5 CPU briefly, it's throttled. Cluster has unused capacity but the pod can't use it.
The right approach. Reserve Guaranteed for latency-critical workloads (high-traffic web servers, real-time processing). Use Burstable for everything else (background workers, internal services, low-latency-tolerant services). The mixed approach maximises cluster efficiency.
Memory limits and OOMKill
CPU limits throttle. Memory limits OOMKill. The day a memory limit kicks in, your pod restarts and you lose state. Always set memory limits 30%+ above the observed p99; never use them as cost control.
The OOMKill mechanism. Pod hits memory limit; kernel kills the process; pod is marked failed; Kubernetes restarts it. Any in-memory state (cache, sessions, in-flight work) is lost. For stateless services, the impact is minor; for stateful, it can be a customer-visible incident.
The over-tight memory limits trap. Engineer sets memory limit at p95 to "save costs." Production traffic occasionally exceeds p95; pods get OOMKilled randomly; cascading restarts cause incidents. Always include 30%+ headroom above p99; the savings from tight limits don't justify the incident risk.
CPU throttling vs. limits
CPU limits don't kill; they throttle. A pod that hits its CPU limit gets paused for the rest of the time slice. If it's CPU-hungry, it's effectively running at the limit value continuously.
The throttling effect. A pod with a 1 CPU limit can never use more than 1 CPU. If it tries (during a spike), it's slowed down. User-facing latency increases; the spike doesn't translate to faster work.
The CPU-limit trap. Engineer sets CPU limit equal to request because "Guaranteed is safest." During a traffic spike, the pod can't use spare cluster CPU; latency degrades; users complain. The fix: use Burstable QoS (limit > request); the pod can burst when cluster has capacity.
Common antipatterns
BestEffort in production. Pods without requests or limits; first to be evicted; behaviour is unpredictable. Always set requests and limits in production.
Limit equals request universally. Guaranteed for everything; clusters over-allocated; bursting impossible. Reserve Guaranteed for the few critical workloads.
Sizing once at deployment. Initial sizing is a guess; re-measure after a week of production traffic; adjust. Most teams skip the iteration; pods stay over-provisioned forever.
Tight memory limits as cost control. OOMKills cause incidents; the cost savings are smaller than the incident impact. Use cluster-level cost controls (node count, instance type) for cost; pod-level memory limits for safety.
What to do this week
Three moves. (1) Audit your pods for QoS class. Most teams find a mix; many BestEffort pods that should be Burstable. Fix the BestEffort first. (2) For your most-running pods, pull last-week metrics and apply right-sizing. The savings are immediate and real. (3) Verify memory limits are 30%+ above p99 for stateful services. The OOMKill protection is what prevents customer-visible incidents.