Alerts Practical By Samson Tanimawo, PhD Published Dec 14, 2025 4 min read

Alert Investment Priorities

Where to invest alert engineering time. The ROI ranking.

The question

Alert engineering time is finite. Where does the next hour of tuning produce the biggest return?

Most teams default to whichever alert fired last night. That's reactive, not optimal.

Rank by page volume times severity times customer impact. The top 3 alerts usually drive 60% of on-call pain.

Priority ranking

Tier 1: alerts that page more than once per shift on average. Fix these first; they are the noise that destroys rotations.

Tier 2: alerts with low MTTA but no clear customer impact. These waste cognitive cycles even when fast to ack.

Tier 3: alerts that fire monthly but always require novel debugging. Worth investing in better runbooks, not better detection.

Where to spend the time

Tighten thresholds first. 60% of noise comes from thresholds set during the panic of an old incident and never revisited.

Add customer-impact context next. Reduces triage time even if the alert keeps firing.

Build dependency suppression last. High leverage but expensive to set up; only worth it after the basics are clean.

Where not to spend

ML anomaly detection on a fundamentally noisy signal. Cleaning the signal source pays better than smarter detection.

Custom dashboards for alerts that are about to be deleted. Sequence the work; don't gold-plate doomed pages.

Tooling that requires a vendor migration to deliver. The migration cost almost always exceeds the alert ROI.

Apply this week

Pull last 30 days of pages. Rank by volume. Pick the top 3.

Allocate 4 hours per top alert. Tune threshold, add impact text, link runbook. Measure for 2 weeks.

Repeat monthly. The top 3 rotates; the discipline does not.