Per-Pod Resource Monitoring
Per-pod metrics critical for debugging.
CPU
Resource monitoring per pod is the discipline of capturing the right metrics at the right granularity for Kubernetes workloads. Pod-level metrics drive right-sizing, capacity planning, and operational decisions. Without pod-level data, the team operates the cluster blind to per-workload behavior.
What pod-level CPU monitoring provides:
- Per-pod CPU usage.: The metric is captured per pod. The team sees how much CPU each pod actually consumes; per-pod patterns are visible.
- Compare to requests.: The pod's CPU request is the allocation. Comparing actual usage to the request reveals over-allocation (usage less than request, idle reservation) or under-allocation (usage approaching limit, throttling risk).
- Right-sizing input.: The comparison drives right-sizing decisions. Pods consistently using less than their request can have requests reduced; pods consistently approaching limits need higher requests or different optimization.
- Throttling detection.: When CPU usage hits the limit, the kernel throttles. Per-pod throttling metrics surface this; throttled pods are slow pods; the data drives investigation.
- Aggregation across replicas.: Multiple replicas of the same workload can be aggregated. The per-replica detail is preserved; the aggregate shows the workload's behavior overall.
CPU monitoring is the foundation. Without it, capacity decisions are guesses.
Memory
Memory monitoring is the discipline that catches leaks before they cause OOMs. The per-pod working set is the metric; trending up over time without traffic increase is the signal.
- Per-pod working set.: The pod's working set memory (the actively-used pages) is captured. The metric is what the kernel considers in-use; it is the relevant number for memory pressure.
- Watch for trending up.: A working set that grows steadily without traffic growth indicates a leak. The trend is the signal; the leak's growth rate determines the urgency.
- Leak.: Memory leaks are a recurring failure mode. Application bugs that allocate without freeing produce growing working sets; eventually the pod hits the limit and is OOMed by Kubernetes.
- OOM precursor.: The working set trending toward the limit is the OOM precursor. Catching the trend early lets the team address the leak before the OOM impacts customers.
- Compare to limits.: The working set's distance from the limit is the headroom. Small headroom means imminent OOM; large headroom is comfortable; the metric guides operational response.
Memory monitoring catches one of the most disruptive failure modes. Without it, OOMs are surprises; with it, they are preventable.
Custom
Beyond CPU and memory, application-specific metrics provide service-level signal. Queue depth, request rate, custom application metrics all reveal what the workload is doing at the application layer.
- App-specific metrics.: Each application has metrics relevant to its purpose. A web service has request rate; a queue consumer has queue depth; a batch processor has job count. The metrics match the application.
- Queue depth.: A queue consumer's queue depth is the saturation signal. Growing queue depth indicates the consumer cannot keep up; the metric drives capacity decisions.
- Request rate.: A web service's request rate is the load signal. The rate plus latency plus error rate produces the standard observability picture; capacity planning uses the rate.
- Service-level signal.: Custom metrics are the service-level signal. CPU and memory show the resource usage; custom metrics show what the service is actually doing.
- HPA targets.: Horizontal Pod Autoscaler can target custom metrics. The autoscaler scales based on queue depth, request rate, or other application-relevant signals; the scaling matches the workload.
Resource monitoring per pod is one of those Kubernetes operations disciplines that pays off in capacity planning and incident response. Nova AI Ops integrates with cluster telemetry, surfaces per-pod metrics, and produces the per-workload visibility that the platform team uses to operate the cluster effectively.