Kubernetes By Nova AI Ops Team Published Sep 20, 2026 13 min read

Kubernetes Autoscaling: HPA, VPA, Cluster Autoscaler, Karpenter Compared

Five autoscaling primitives in Kubernetes solve five different problems and interact in subtle ways. Get the combination wrong and your cluster either oscillates wildly or fails to scale at all. Here is the practical comparison.

The Two Layers of Autoscaling

Kubernetes autoscaling operates at two distinct layers, and the tools at each layer solve different problems.

Pod-level autoscaling changes the number or size of pods in response to load. The relevant tools are HPA (changes pod count), VPA (changes pod resource requests), and KEDA (changes pod count based on external event sources).

Node-level autoscaling adds or removes nodes from the cluster in response to scheduling pressure. The relevant tools are Cluster Autoscaler and Karpenter.

A complete autoscaling setup needs both layers. Pod-level autoscaling without node-level means pods get scheduled but no nodes appear to host them. Node-level without pod-level means nodes get added but workloads do not use the extra capacity.

HPA: Horizontal Pod Autoscaler

What it does: Adjusts the replica count of a Deployment, StatefulSet, or ReplicaSet based on observed metrics (default: CPU or memory).

How it works: The HPA controller polls the metrics-server every 15 seconds, computes the desired replica count using desired = ceil(current * (currentMetric / targetMetric)), and updates the workload's replica count.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payment-svc
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-svc
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

When to use: Stateless services with traffic that varies over time. The default scaling primitive every team should adopt for web-tier workloads.

Gotchas:

HPA based on CPU only is reactive; it scales up after load arrives. For latency-sensitive services, use a leading indicator (queue depth, request rate) via the custom metrics API.
The default scale-down behavior is conservative (5-minute stabilization window) to prevent thrashing. Tune behavior.scaleDown if you need faster scale-in.
HPA cannot scale down below minReplicas, even at zero traffic. Set minReplicas: 1 for dev/staging to allow zero-cost off-hours.

VPA: Vertical Pod Autoscaler

What it does: Adjusts the CPU and memory requests on individual pods based on historical usage.

Three modes:

Off: Just collects recommendations, doesn't apply them. Use this mode for analysis.
Initial: Sets requests when a pod is created, doesn't change them after. Useful for batch jobs.
Auto: Continuously updates requests. Triggers pod recreation when changes are needed (this is the most disruptive mode).

When to use: Workloads with stable, predictable resource patterns. Best in Off mode initially to gather right-sizing recommendations, then apply manually after review.

Critical gotcha: VPA in Auto mode does not work cleanly with HPA on the same pod, because both controllers fight over scaling decisions (HPA wants more replicas at smaller size, VPA wants larger pods at the same count). Pick one or the other for any given workload, or use VPA in Off mode (recommendations only) alongside HPA.

KEDA: Event-Driven Autoscaling

What it does: Extends HPA to scale based on external event sources: queue depth (Kafka, RabbitMQ, SQS), database row counts, Prometheus queries, cron schedules, and 60+ other scalers.

Why it matters: Many workloads scale better on a leading indicator than on CPU. A queue consumer should scale based on queue depth, not CPU utilization. KEDA makes this trivially easy.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: payment-processor
spec:
  scaleTargetRef:
    name: payment-processor
  minReplicaCount: 0  # KEDA can scale to zero
  maxReplicaCount: 100
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: payment-group
      topic: payments
      lagThreshold: '100'

When to use: Any workload that processes external events (queues, streams, scheduled jobs). KEDA is the primary tool for serverless-style "scale to zero" patterns in Kubernetes.

Gotcha: Scale-to-zero introduces cold-start latency when traffic returns. Acceptable for batch and async workloads; usually not acceptable for synchronous user-facing services.

Cluster Autoscaler: The Legacy Option

What it does: Watches for unschedulable pods and adds nodes to the cluster from a configured node group. Removes nodes that have been underutilized for a configured period.

How it works: Cluster Autoscaler integrates with cloud-provider node groups (AWS Auto Scaling Groups, GCP Managed Instance Groups, etc.). When a pod cannot be scheduled because of resource pressure, CA increases the desired count on the node group. The cloud provider provisions a new VM, joins it to the cluster, and the pod schedules.

When to use: Clusters with simple, predictable workload patterns and a small number of node-group shapes. Mature, well-understood, no surprises.

Limitations:

Slow to provision: 60-120 seconds for a new node to join.
Bound to predefined node groups; cannot dynamically pick a node shape based on the pod's needs.
Conservative scale-down (10-minute stabilization, careful pod-disruption checks).
Does not consolidate underutilized nodes by moving pods around.

Karpenter: The Modern Option

What it does: Replaces Cluster Autoscaler with a smarter, faster, and more flexible alternative. Karpenter dynamically picks the best node shape for the pending pods, rather than being bound to predefined node groups.

How it works: Karpenter watches for unschedulable pods and decides in real time which EC2 instance type best matches the pending workload's resource requirements. It can mix spot and on-demand, pick the cheapest available instance type, and consolidate underutilized nodes by rescheduling pods.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

When to use: AWS EKS clusters (primary support) and increasingly other clouds. Karpenter delivers 20-40% cost reduction over Cluster Autoscaler in most environments by virtue of better instance-type selection and aggressive consolidation.

Why it wins: Faster provisioning (30-60 seconds vs 60-120 for CA), no node-group management, native support for diversified spot strategies, and continuous workload consolidation. The recommended choice for new EKS deployments in 2026.

How They Interact (and Conflict)

The combinations matter:

HPA + Cluster Autoscaler / Karpenter: The standard production setup. HPA scales pod count up; CA/Karpenter adds nodes to host the new pods. Works cleanly.

HPA + VPA on the same workload: Conflict. VPA changes resource requests; HPA scales replicas based on utilization of those requests. The two controllers fight. Use VPA in Off (recommend-only) mode if you also use HPA.

VPA + Cluster Autoscaler: Works, but VPA's pod-recreation can trigger node churn. Tune VPA's updatePolicy to limit how often pod recreation happens.

KEDA + HPA: KEDA creates an HPA under the hood. Do not configure both manually for the same workload.

KEDA + Karpenter: The strongest stateless-async setup. KEDA scales pods to zero when there is no work; Karpenter rapidly provisions nodes when work arrives.

The Recommended Stack for 2026

For a typical production Kubernetes cluster in 2026, this combination delivers the best balance of cost and performance:

HPA on every stateless web-tier deployment, with min/max replicas tuned for traffic patterns and a custom metric (request rate or latency) where possible.
KEDA for queue consumers, batch processors, and any workload triggered by external events. Enables scale-to-zero for off-hours cost savings.
VPA in recommendation mode for ongoing right-sizing analysis. Apply recommendations manually during quarterly reviews.
Karpenter as the node-level autoscaler. Configure with mixed spot + on-demand, multiple instance families, and aggressive consolidation.

Skip Cluster Autoscaler unless you are on a cloud where Karpenter is not yet supported. Skip VPA in Auto mode unless you have a workload where HPA is not viable (rare for stateless workloads, more common for stateful databases).

For teams that want autoscaling decisions to be informed by application-level intelligence, like correlating scale events with deployment changes, predicting traffic spikes from historical patterns, and pre-warming capacity for known peak hours, AI-native platforms like Nova AI Ops add a layer of predictive autoscaling on top of these primitives. The platform also detects scaling anti-patterns (services that constantly oscillate, scale events that fail repeatedly) and recommends configuration changes. Try Nova to evaluate.