PromQL Patterns That Scale to 10M Series
PromQL is easy at small scale and slow at large scale. The patterns that distinguish a query that returns in 200ms from one that times out at 30 seconds.
Where PromQL cost comes from
Two things drive query time: number of series scanned and time range. A query like rate(http_requests_total[5m]) over 30 days, on 5 million series, scans 30 × 5M × samples = enormous. Rein both in.
The math at scale. Prometheus stores ~120 samples per series per hour (one per 30 seconds). Over 30 days that's 86k samples per series. 5M series × 86k samples = 430 billion data points to scan. The query takes minutes; sometimes times out.
The architectural reality. Prometheus is not a general analytic database. It's optimised for low-cardinality monitoring at high resolution. Asking it analytical questions (long time ranges, high cardinality) hits the architecture's limits. The patterns below work WITH the architecture, not against it.
Recording rules
Pre-compute expensive aggregations at scrape time. job:http_request_rate_5m:sum stores a 1-series-per-job summary that the dashboard queries instead of re-computing. Recording rules turn a 30-second query into a 30ms one.
The mechanics. A recording rule defines a query that runs continuously (every 30s typically); the result is stored as a new metric. Dashboards query the new metric instead of recomputing the original. The expensive aggregation happens once at scrape time; queries are cheap.
The naming convention. level:metric:operation, for example job:http_requests_total:rate5m means "rate over 5 minutes of http_requests_total, aggregated to job level." The convention helps engineers find the right pre-computed metric instead of writing the expensive original.
The trade-off. Storage cost (each recording rule adds a metric). Compute cost during scrape (each rule runs every 30s). Both are dramatically less than running the original query on every dashboard load. Trade once-per-scrape for many-times-per-query and the math is overwhelming.
Labels vs aggregation
If a query ends with by (label) and the label has 10k distinct values, you computed 10k results just to render a 5-line chart. Push the aggregation up: query for the top-N first, drop the rest.
The pattern. Replace sum by (user_id)(rate(http_requests_total[5m])) with topk(10, sum by (user_id)(rate(http_requests_total[5m]))). Same query intent (per-user rate); the topk limits output to 10 series. The dashboard renders 10 instead of 10k; query cost drops correspondingly.
The other discipline. Drop labels you don't need before the final aggregation. sum without (instance)(metric) aggregates across instances; query then operates on aggregated series. Faster than including every instance.
Avoid subqueries
Constructions like rate(metric[5m:30s]) recompute rate at every step. They are correct, occasionally necessary, and almost always slow. Replace with recording rules where you can.
The mechanism. Subqueries cause Prometheus to evaluate the inner expression at every step of the outer expression's range. For a 1-day query at 1-minute resolution, that's 1440 sub-evaluations. Each sub-evaluation does work; total cost is multiplicative.
The replacement. Most subqueries can be replaced with recording rules. Pre-compute the inner expression as a recording rule; outer expression queries the pre-computed metric. Same answer, dramatically faster.
The legitimate use cases for subqueries. One-off queries during incident response, where there's no recording rule yet. Ad-hoc analytics. Both are tolerable because they're infrequent. Dashboard queries should never use subqueries; the dashboard load amplifies the cost.
The 5-step optimisation checklist
- Is there an aggregation we can do at scrape time? Use a recording rule.
- Are we scanning more time range than the dashboard shows? Tighten the range.
- Are we using
rate()on a Counter that resets? Useincrease(). - Are we joining two metrics with
on()? Make sure the labels match exactly; otherwise it returns empty. - Are we using
topk/bottomk? Move it as early in the expression as possible.
Step 1: recording rule check. If the same expensive aggregation appears in 3+ dashboards, it deserves a recording rule. The pattern emerges over time; review quarterly.
Step 2: range check. Dashboards default to long ranges (e.g., 7 days); queries scan the entire range even if only the last hour is plotted. Tighten ranges to what's actually needed; query cost drops.
Step 4: join discipline. on(job) joins matching metrics by job label. If one metric has additional labels (e.g., instance), use group_left or aggregate first. Mismatched joins return empty results, confusing to debug.
Dashboard query patterns
Dashboard queries should use recording rules wherever possible. Each panel queries a pre-computed metric; the page loads in 200ms instead of 30 seconds. Multiplied across 20 panels, the difference between "snappy dashboard" and "dashboard nobody uses."
The discipline. When building a new dashboard, write the queries first. For each, ask: "would this benefit from a recording rule?" If yes, create the rule before the dashboard ships. The dashboard never has slow panels.
The maintenance. Recording rules accumulate. Audit annually; delete rules that aren't used. The TSDB cardinality burden of unused rules is real.
Common antipatterns
The query that times out. Engineer writes a complex query; tries it on the dashboard; query times out. Their fix: increase the timeout. Wrong fix; the query needs optimisation. Apply the 5-step checklist instead.
Subqueries for "convenience." Engineer uses subqueries because the syntax is shorter than recording rules. Subqueries are slower, so engineering convenience trades user pain. Use recording rules for repeated patterns.
The dashboard with 50 panels. Each panel queries the database; the dashboard load runs 50 queries. Even with optimisation, that's a lot of work. Consolidate; most panels can be combined or removed.
Hardcoded ranges. Dashboard always queries last 7 days, even when the user is looking at last hour. Use Grafana's `$__range` variable so queries match the dashboard's selected range.
What to do this week
Three moves. (1) Find your team's slowest dashboard. Apply the 5-step checklist to its queries. Most teams find 3-5 queries that benefit from recording rules. (2) Establish a naming convention for recording rules. Convention prevents duplicate rules and makes them findable. (3) Audit recording rules for unused ones. Delete the ones that haven't been queried in 90 days; storage cost drops.