Cluster Cost Optimization 2026
Most K8s clusters waste 30-50%. The audit.
Audit
Cluster cost optimization is the discipline of reducing Kubernetes spending without compromising reliability. The audit identifies waste; the right-sizing tools eliminate it; the spot instances provide additional savings for tolerant workloads. The discipline is sustained over time.
What audit looks like:
- Per pod: requested vs used.: Each pod's request is what it reserves; its actual usage is what it consumes. The gap between them is over-allocation; the audit measures it.
- Average utilization.: The team measures average utilization over time. Some pods are spiky (high peak, low average); some are steady (consistent usage). The pattern matters for right-sizing decisions.
- Most teams find 30 to 50% waste.: The first audit typically reveals significant waste. Pods with requests far above actual usage; idle reservations; over-provisioned capacity. The waste is recoverable.
- Identify outliers.: The pods with the largest gap between request and usage are the optimization targets. Right-sizing them produces the largest savings.
- Track over time.: The audit runs periodically. Waste trends; optimization progress; new waste from new workloads all are visible. The discipline is sustained.
The audit is the foundation. Without it, optimization is guesswork.
VPA
Vertical Pod Autoscaler right-sizes pods. The autoscaler observes actual usage; recommends or applies appropriate requests; the over-allocation is reduced automatically.
- Right-size pods.: VPA observes pod usage and adjusts requests. The pod's request matches its actual usage; the over-allocation is eliminated; the cluster's effective capacity grows.
- Set requests close to actual.: The right request is close to actual usage with some headroom. VPA produces this; manual right-sizing is supplemented by the autoscaler's data.
- Don't over-allocate.: Over-allocation is waste. The pods reserve more than they use; the cluster pays for unused capacity; right-sizing eliminates the waste.
- Recommendation mode.: VPA in recommendation mode suggests changes without applying. The team reviews; the changes apply through normal deployment processes; the discipline is preserved.
- Auto mode for some workloads.: For workloads that tolerate restart, VPA can automatically apply the recommendations. The right-sizing happens without human action; the savings are automatic.
VPA is the right-sizing tool. The discipline is using it appropriately for each workload's tolerance.
Spot
Spot instances provide significant savings for fault-tolerant workloads. Batch jobs, stateless workers, ML training all are candidates; the savings are 60-80% compared to on-demand.
- For batch and stateless.: Workloads that tolerate interruption fit spot. Batch jobs that checkpoint; stateless workers that retry; ML training with checkpointing all are good fits.
- Spot saves 60-80%.: The savings are substantial. Spot prices are typically 60-80% below on-demand; the workload's compute bill drops correspondingly.
- Tolerate interruption.: Spot instances can be terminated by AWS. The workload must handle the interruption; the team's checkpointing or retry logic ensures the work completes.
- Diversify pools.: Spot pools can become unavailable. The team uses multiple instance types and AZs; the diversification reduces the probability that all pools are unavailable simultaneously.
- Combine with on-demand.: The team's fleet uses spot for tolerant workloads and on-demand for sensitive workloads. The mix produces both cost savings and reliability where needed.
Cluster cost optimization is one of those FinOps disciplines that produces compounding savings. Nova AI Ops integrates with cluster cost data, surfaces optimization opportunities, and produces the per-workload visibility that the team uses to drive savings.