Observability Cost Engineering: Cutting Spend Without Losing Signal

Most observability bills can be cut in half without losing diagnostic capability. The discipline is mechanical; the savings are large.

Where the bill actually goes

Observability bills break down predictably: ingest (per-event cost), retention (per-GB-day), query (per-query or per-engineer), and platform fees. Most teams cannot tell which dominates.

First step: get the breakdown. Ask your vendor for a cost report by metric, by service, by retention tier. Without the breakdown, optimisation is guessing.

Cardinality reduction

Cardinality is usually the biggest single lever. Every per-user, per-request, per-URL label multiplies storage. The cardinality-explosion playbook (covered separately) cuts 30-50% on its own.
The discipline: per-metric series budgets, weekly review, ownership.

Sampling strategy

Sampling cuts span volume directly. Tail-based sampling with rules ‘keep all errors, keep p99 latency’ gives 80%+ reduction at zero diagnostic cost.

Apply the same to logs: structured logs at INFO sampled at 10% in production; ERROR retained at 100%.

Retention tiering

Hot tier (queryable in seconds, expensive): 7-14 days. Warm tier (queryable in minutes, cheaper): 30-90 days. Cold tier (S3, queryable in hours, cheapest): 1+ year.

Most queries hit hot data. Tier accordingly. The savings on retention are usually the second-biggest after cardinality.

Cost-aware dashboards

Optimising spend without measuring queries. You delete the metric the team needed.
Aggressive sampling without rules. The trace you need is gone.
One-time cost cut. Spend creeps back without the discipline.

What to do this week

Three moves. (1) Get the per-metric / per-tier cost breakdown from your vendor. (2) Apply the cardinality playbook to your top-3 most expensive metrics. (3) Schedule the quarterly cost review.