Kubernetes Practical By Samson Tanimawo, PhD Published Apr 17, 2026 4 min read

HPA Tuning for Real Workloads

Default HPA settings are conservative. The tuning that catches bursts.

Metrics

CPU is default. Custom metrics for actual load (RPS, queue depth).

Latency-based HPA for user-facing services.

Thresholds

Scale-up at 60-70% utilization. Scale-down delay 5 min minimum.

Conservative on scale-down; eager on scale-up.

Avoid

Aggressive scale-down. Flapping wastes capacity.

Stabilization windows: tune by service.