The Metric Naming Convention That Survives
Most metric naming is haphazard. The convention that scales across teams and produces queryable names.
The pattern
The metric naming pattern is domain.subject.action.unit. Predictable so engineers can guess names, greppable so dashboards stay maintainable, units at the end so unit confusion stops being a regular bug.
- domain.subject.action.unit shape. Four parts per metric, e.g.
http.request.duration.seconds; predictable shape lets engineers write metrics without a registry lookup. - Lowercase, period-separated, singular. Consistent tokens across the codebase; mixing styles produces grep misses and dashboard breakage.
- Units at the end. Time in seconds, sizes in bytes, rates per second; the convention removes ambiguity that would otherwise show up as silent dashboard bugs.
- Documented unit choice per metric. Explicit ms-or-seconds decision per metric type; mixed-unit dashboards are the failure mode without it.
Rules
Three rules keep names usable. Consistent separators, no obscure abbreviations, no boolean-named metrics. Each one prevents a failure mode that real codebases hit.
- One separator consistently. Periods or underscores, pick one per codebase; mixing is a smell that compounds with every new metric.
- No obscure abbreviations. "http" is fine; "cnxn" is not. Engineers should not need a glossary to read a dashboard.
- No boolean-named metrics. Encode boolean state as the value (0/1), not the name (
metric_name_activeis awkward and makes aggregation hard). - Linter check per rule. Explicit linter pattern per rule supports automated enforcement; conventions without enforcement decay.
Labels vs metric names
Labels carry dimensions; metric names carry behaviours. High-cardinality labels belong in logs, not metrics, because every unique label combination is a separate time series.
- Metric per behaviour, label per dimension.
http.request.durationwith a method label, not separate metrics per HTTP method. - Cardinality cost per label. Each unique label combination is a separate time series; high cardinality drives storage and query cost.
- High-cardinality labels go in logs. user_id, request_id, trace_id all belong in logs, not metrics; the metrics system will not survive them.
- Standard labels across services. service, environment, version, region applied uniformly; consistency unlocks cross-service queries.
Enforcement
Enforcement makes the convention real. Linting at PR time catches violations before merge; migration aliasing eases renames without breaking dashboards; quarterly audits surface drift before it accumulates.
- Lint at PR time. Custom linter check rejects non-conforming names; the convention becomes a property of merging, not a property of culture.
- Migration aliasing. Old-to-new alias per rename so old metric keeps working while dashboards migrate; cutover happens on a schedule, not a moment.
- Quarterly audit. Drift-from-convention surfaced and fixed each quarter; one-shot conventions decay otherwise.
- Named owner per rule. Maintaining team for the linter and convention doc; stale rules are how conventions die.
Why this matters
Naming conventions compound across years. Guessable names produce productivity gain on every dashboard query; cross-team queries become possible; metric names outlive the teams and tools that created them.
- Engineers guess metric names. No-lookup-required productivity gain per engineer per day; compounds across the team.
- Cross-team queries. Consistent names enable operations dashboards composed from per-team metrics; without convention, every dashboard is bespoke.
- Names outlive teams and tools. Multi-year persistence per metric; convention discipline pays back across migrations and reorgs.
- Published convention per org. Central name guide supports onboarding and survives the original authors.