Alerts Practical By Samson Tanimawo, PhD Published Feb 18, 2026 4 min read

Cardinality Explosion Alert

Cardinality spikes are the most expensive monitoring problem. Alert on them.

Why cardinality matters

Time-series databases bill on unique label combinations. A label like user_id explodes cardinality and the bill.

Above 1M active series, Prometheus and Cortex slow down. Above 10M, queries time out and ingestion drops.

Cardinality is the silent killer of observability stacks. The first warning is the bill, not the alert.

Prometheus: alert when `prometheus_tsdb_head_series` grows more than 30% week-over-week.

Or alert on per-metric cardinality: `count(count by (__name__) ({__name__=~".+"})) > 100000` per metric.

Datadog and Honeycomb expose cardinality dashboards; alert on the per-metric column when it crosses a budget.

Per-team budget: 1M active series. Per-metric: 100k unique combinations.

Above budget, the team must drop a label or aggregate. CI fails the deploy that adds a high-cardinality label.

Publish the budget and current usage in a dashboard. Visibility is half the discipline.

User IDs, request IDs, full URLs in labels. Replace with hashes, route patterns, or bucketed values.

Container IDs from Kubernetes that include random suffixes. Use deployment name instead.

Customer IDs at the metric level. Move to per-tenant aggregates and use traces for per-tenant detail.

Drop the offending label at the scrape config or OTel collector. Prometheus relabel_config, OTel attribute processor.

Aggregate up: `sum by (deployment) (...)` replaces per-pod metrics with per-deployment.

Add a metric_relabel_configs rule that drops the worst series. Document what got dropped so debugging is possible.