Symptom-Based vs Cause-Based Alerts: Which Wins

The default modern advice is ‘alert on symptoms, not causes.’ The advice is right 80% of the time. The 20% matters.

What each means

Symptom alerts: “users see slow responses.” Cause alerts: “CPU is at 95%.”

Symptoms are user-perceived; causes are infrastructure-measured. Most teams alert on causes; users care about symptoms.

Why symptoms win the default

Symptoms catch what matters and ignore what does not. CPU at 95% on a workload that handles it fine is not an incident; pages on it are noise.
Symptoms also catch cause combinations. A cause-only alert misses outages where two unhealthy systems combine to cause user pain.

When causes win

Causes win for early-warning. CPU climbing slowly is a leading indicator; user-perceived slowness is the lagging indicator.

Causes also win when you cannot measure the symptom directly, backend services with no user-facing metric.

The hybrid pattern

Page on symptoms (high signal, user-visible). Ticket on causes (early warning, internal). The two-tier pattern keeps pages clean and signals trends early.

Most teams that go symptom-only lose visibility into early-warning. The hybrid is the realistic posture.

Antipatterns

Cause-only alerts. Pages on noise; misses real impact.
Symptom-only alerts. No early warning.
Both at page-tier. Doubles pager noise.

What to do this week

Three moves. (1) Apply this pattern to your noisiest alert. (2) Measure pages-per-shift before/after for one week. (3) Schedule the quarterly review so the discipline survives team turnover.