HPA Tuning for Real Workloads

Default HPA settings are conservative. The tuning that catches bursts.

Metrics

HPA scales only as well as the metric pointed at it. CPU is the default and the wrong choice for most user-facing services. RPS, queue depth, in-flight requests, or p95 latency predict real load far better than CPU does.

Thresholds

Threshold tuning is asymmetric. Add capacity fast at 60-70 percent utilisation to keep headroom; remove capacity slowly with a 5-minute stabilisation window so transient dips do not cause flapping.

Avoid

Three failure modes recur. Aggressive scale-down causes flapping; untuned stabilisation defaults break bursty workloads; default min and max replicas produce scale-to-zero or runaway scale-up surprises.