Ephemeral Storage Limits
Ephemeral storage requests prevent disk-full from one pod.
Set
Ephemeral storage is the disk space pods can use for temporary files, logs, and emptyDir volumes. Without limits, a single misbehaving pod can fill the node's disk; the entire node becomes unhealthy. Setting limits prevents the cascade; the discipline is bounding ephemeral storage explicitly.
What setting limits looks like:
- resources.requests.ephemeral-storage and limits.: Pods declare their ephemeral storage request and limit. The request is what they reserve; the limit is the maximum they can use; the kernel enforces the limit.
- Default unlimited.: Without explicit setting, ephemeral storage is unlimited. The pod can consume the entire node's disk; nothing in the pod's spec prevents it.
- Bound it.: The team sets explicit ephemeral-storage limits on every pod. The bound prevents one pod from disrupting the node; the discipline is universal.
- Per-pod calculation.: Different pods need different storage. A logging-heavy pod needs more; a stateless API pod needs little. The limits match the workload's actual needs.
- Aggregate at the node level.: The sum of pod limits should be less than the node's available ephemeral storage. The cluster scheduler uses requests for placement; the node enforces limits.
Setting the limits is the foundation. Without explicit limits, the workload has unbounded blast radius.
Eviction
When a pod exceeds its ephemeral-storage limit, Kubernetes evicts it. The eviction is the enforcement; without it, the limit is advisory.
- Pods over limit get evicted.: The kubelet monitors ephemeral storage usage. When a pod exceeds its limit, the kubelet evicts it. The eviction frees the disk; the node stays healthy.
- Catches log spam.: A pod that suddenly produces excessive logs (a bug, a debug-level change, an infinite loop) can fill the disk. The eviction stops the bleeding; the disk space is reclaimed.
- Large temp files.: Some workloads produce large temporary files. Without ephemeral-storage limits, the temp files can fill the node; with limits, the workload is bounded.
- Eviction signal in postmortems.: When a pod is evicted for ephemeral storage, the eviction event is recorded. Postmortems can reference the event; the team understands what happened.
- Tune limits to avoid eviction.: If pods are repeatedly evicted, the limits may be too tight. The team adjusts; the discipline is bidirectional (catch leaks, support legitimate use).
Eviction is the enforcement mechanism. Without it, the limit produces no behavior change; with it, the limit is real.
Review
The limits should match actual usage. Periodic review keeps the limits aligned with reality; over-tight limits cause unnecessary evictions; over-loose limits do not protect.
- Per service: typical usage.: The team observes the typical ephemeral storage usage per service. The number is the baseline; the limit is set above it.
- Set limits 2x typical.: The 2x factor provides headroom for occasional bursts. Most pods stay under the limit; occasional spikes are accommodated; sustained excess produces eviction.
- Track eviction frequency.: The team tracks how often pods are evicted for ephemeral storage. Frequent evictions indicate too-tight limits or a leak; the data drives the decision.
- Update as workloads change.: Workload changes can shift ephemeral storage usage. New features that produce more logs; new dependencies that create more temp files. The limits update with the workloads.
- Document the rationale.: Each pod's limit has a rationale. The team's documentation captures why the limit is what it is; future maintainers understand the choice.
Ephemeral storage management is one of those Kubernetes operational disciplines that prevents a class of node-level incidents. Nova AI Ops integrates with cluster telemetry, surfaces ephemeral storage usage and eviction patterns, and produces the per-pod tuning queue that the platform team uses.