Cloud & Infrastructure Practical By Samson Tanimawo, PhD Published Jun 20, 2026 4 min read

Cluster Autoscaler Tuning: Cost vs Latency

Default cluster-autoscaler settings are conservative. The tuning that catches scale-up bursts without paying for excess capacity.

Scale-up parameters

scale-down-delay-after-add: prevents flapping during burst. 10 minutes default; 5 minutes for fast workloads.

max-node-provision-time: 15 minutes default; reduce if your nodes provision faster.

Scale-down parameters

scale-down-utilization-threshold: 0.5 default. Higher means more aggressive scaling down.

Trade-off: aggressive scale-down lowers cost; produces more frequent re-scheduling.

Workload patterns

Bursty workloads: prioritise scale-up speed.

Steady workloads: prioritise scale-down efficiency.