Cluster Autoscaler Tuning: Cost vs Latency
Default cluster-autoscaler settings are conservative. The tuning that catches scale-up bursts without paying for excess capacity.
Scale-up parameters
scale-down-delay-after-add: prevents flapping during burst. 10 minutes default; 5 minutes for fast workloads.
max-node-provision-time: 15 minutes default; reduce if your nodes provision faster.
Scale-down parameters
scale-down-utilization-threshold: 0.5 default. Higher means more aggressive scaling down.
Trade-off: aggressive scale-down lowers cost; produces more frequent re-scheduling.
Workload patterns
Bursty workloads: prioritise scale-up speed.
Steady workloads: prioritise scale-down efficiency.