Metrics vs Traces for Performance

Different views.

Overview

Metrics and traces are not competing observability signals; they are complementary views of the same system. Metrics show aggregate behaviour over time at low cost; traces show per-request detail at higher cost. Performance investigation almost always uses both: metrics narrow the question, traces answer it. Treating either one as “the” observability signal leaves the other half of the problem invisible.

Two different views of the same system. Metrics aggregate; traces follow individual requests. Both are needed; neither is sufficient.
Metrics for aggregate trends. Per-service latency, error rate, throughput over time. The dashboards on-call watches.
Traces for per-request detail. The end-to-end path of a single request through the system. The data that explains why p99 spiked.
Combined investigation plus per-tier budget. Metrics narrow the question, traces resolve it; per-tier observability budget keeps both affordable.

The approach

Three habits produce fast root cause: metrics for trends, traces for investigation, and the discipline to use both together rather than picking a favourite.

Metrics for trends. Per-service latency, error rate, throughput on the standing dashboard. The view operations starts every shift with.
Traces for investigation. Per-request path with span timing. The data that turns “p99 is high” into “this database call is slow”.
Combined investigation flow. Metrics narrow which service or endpoint is misbehaving; traces explain why.
Per-tier observability budget plus documented strategy. Sampling and retention tuned per tier; per-team the observability strategy lives in the runbook.

Why this compounds

Each combined investigation deepens the team’s observability fluency. The patterns transfer between services; new services inherit the metric/trace conventions instead of recreating them.

Faster root cause. Right signal for the question cuts MTTR on the recurring incident classes.
Cost efficiency. Sampling traces and aggregating metrics keeps observability spend matched to value.
Engineering culture shifts. Investigation moves from guessing to evidence. PR reviews start citing trace data.
Year-one investment, year-two habit. First combined investigation is heavy lift. By year two, the metric-then-trace flow is muscle memory.