Set Up VictoriaMetrics

High-scale TSDB.

Overview

VictoriaMetrics is a high-scale, low-resource time-series database that speaks PromQL. Compared to Prometheus, it ingests tens of millions of samples per second per node, uses meaningfully less memory, supports years of retention on commodity disk, and scales horizontally via cluster mode (vminsert, vmstorage, vmselect). For deployments that hit Prometheus’ scaling limits or that need years of metrics history, VictoriaMetrics is the operational upgrade.

High-scale TSDB. Tens of millions of samples per second per node; the throughput Prometheus tops out before reaching.
Low resource use. Lower memory footprint than Prometheus; the cost savings recur with every byte of metric data.
PromQL plus MetricsQL. Standard PromQL with extensions; preserves Prometheus skills and dashboards.
Cluster mode plus long-term storage. Horizontal scaling via vminsert/vmstorage/vmselect; years of retention on commodity disk or object storage.

The approach

The practical approach is single-node first for small deployments (simpler operations), cluster mode when ingestion or retention exceeds single-node capacity, vmagent for collection (lightweight collector replaces Prometheus scrape pods), long retention by default (the storage is cheap, the historical analysis is valuable), and per-cluster topology documented in the infrastructure repo so the configuration is reviewable.

Single node to start. Simpler operations for small deployments; cluster mode adds operational complexity that single-node avoids.
Cluster mode for scale. vminsert, vmstorage, vmselect components; horizontal scaling for ingestion and query.
vmagent for collection. Lightweight collector replaces Prometheus scrape; integrates with existing scrape configs.
Long retention plus documented topology. Years of metrics on commodity disk; per-cluster configuration committed for operational review.

Why this compounds

VictoriaMetrics mastery compounds across services. Each year of retained metrics produces historical investigation capability that no shorter retention can match; the team builds operational muscle for high-cardinality TSDB at lower cost than Prometheus would charge in memory; new services inherit a high-scale metrics surface as a default.

Cost efficiency. Lower resource use than Prometheus; the bill drops where the team would otherwise scale Prometheus replicas.
Long-term metrics. Historical analysis becomes possible; year-over-year comparisons anchor planning conversations.
Higher cardinality support. VictoriaMetrics handles more series than Prometheus at the same hardware budget; modern label-heavy services fit.
Institutional knowledge. Each query teaches monitoring patterns; the team builds vocabulary for high-cardinality TSDB operation.

VictoriaMetrics is an infrastructure investment that pays off across years. Nova AI Ops integrates with metrics telemetry, surfaces TSDB patterns, and supports the team’s monitoring discipline.