Elasticsearch Tuning

Shards; replicas; refresh.

Overview

Elasticsearch tuning matches shard count, replica count, refresh interval, JVM heap, and index lifecycle settings to the actual workload rather than to the framework defaults. The defaults are conservative for safety; tuned clusters routinely run 2-5x more efficiently. Most production Elasticsearch incidents trace back to one of: too many small shards, too-aggressive refresh interval, JVM heap above the 30GB compressed-object-pointer boundary, or absence of ILM letting hot data stay hot forever.

Shards. Number of primary shards per index; 30-50 GB per shard is the typical sweet spot.
Replicas. Replica count for resilience and read throughput; one replica minimum, more for read-heavy workloads.
Refresh interval. How often new data becomes searchable; default 1s is aggressive, 30s suits most non-search-critical workloads.
JVM heap plus index lifecycle. Half of node memory capped at 30 GB (compressed object pointers boundary); ILM moves data through hot/warm/cold/frozen tiers.

The approach

The practical approach is to size shards to 30-50 GB each (smaller produces overhead, larger produces uneven distribution), one replica minimum for resilience and read throughput, refresh interval of 30s for non-search-critical workloads (default 1s is aggressive), JVM heap at half the node memory capped at 30GB to stay below the compressed-object-pointer boundary, and ILM policies that move data through hot, warm, cold, and frozen tiers as it ages.

Shard count: 30-50 GB per shard. Larger shards reduce overhead; smaller shards spread better; the sweet spot lands here for most workloads.
One replica minimum. Resilience and read throughput; replica count grows with read load.
Refresh interval: 30s for non-search-critical. Default 1s is aggressive; 30s reduces segment merges and improves indexing throughput.
Heap at half memory, max 30 GB plus ILM. Heap above 30GB loses compressed object pointers and pays GC tax; ILM moves data through tiers automatically.

Why this compounds

Elasticsearch tuning compounds across the cluster lifetime. Each tuned setting produces ongoing performance; each ILM policy keeps the cluster manageable as data accumulates; the team builds search-platform muscle that pays off on every new index.

Search speed. Right shard count produces fast queries; the search latency tracks the data shape.
Cost efficiency. Tier-aware ILM matches cost to access; cold data on cheap storage, hot data on fast disk.
Stability. Tuned heap and shards reduce GC and memory pressure; the cluster does not page itself to death.
Institutional knowledge. Each tuning iteration teaches the engine; the team builds vocabulary for Elasticsearch operation.

Elasticsearch tuning is an operational discipline that pays off across years. Nova AI Ops integrates with search telemetry, surfaces tuning patterns, and supports the team’s search-platform discipline.