Alerts Practical By Samson Tanimawo, PhD Published Feb 15, 2026 4 min read

Alert Cost Tracking

Each alert has a cost. Track it.

What an alert costs

Direct cost: SaaS per-rule fees from Datadog, New Relic, and PagerDuty. A noisy rule firing 1000 times a month consumes events you pay for.

Indirect cost: pager interruption time. A single page costs roughly 25 minutes of focused work even when it's noise.

Capacity cost: alert evaluation load on Prometheus or AlertManager. High-cardinality rules with 1m intervals burn CPU continuously.

How to track

Tag every rule with owner_team, service, and tier. Aggregate fire count by tag weekly.

Pull alert fires from Alertmanager's /api/v2/alerts endpoint, push to a metrics backend. Datadog and PagerDuty expose similar APIs.

Compute cost per alert: ($vendor_fee + $on_call_minutes * fires) / 30 days. Publish a top-10 most-expensive list.

Acting on the data

Highest-cost rule each month gets a mandatory review. Either fix the underlying issue or delete the rule.

Tie alert cost to team budgets. SRE pays the bill until the owning team takes ownership.

Reject new rules from teams whose existing rules are in the top quintile by cost.

Realistic savings

A 200-rule catalog typically has 15 to 25 rules consuming over half the noise budget.

Removing those usually drops vendor event counts by 40 to 60%. PagerDuty and Datadog both bill events directly.

On-call satisfaction scores rise within one rotation cycle. The cost-per-page metric makes the trade visible to leadership.

Start small

Week 1: collect fire counts, no actions. Week 2: publish the top-20 list to engineering. Week 3: delete or fix the top 5.

Skip vendor-supplied noise reduction features until the catalog is clean. They mask the problem rather than solve it.

Make the fire-count dashboard public. Visibility is the cheapest intervention.