Alert vs Dashboard Decision

Some signals belong on dashboards, not in alerts.

The decision rule

The decision is action-driven. Alert if customer impact is happening or imminent, time-sensitive action is required, and someone needs to act now; dashboard if trending data, aggregate metrics, situational awareness, or post-hoc analysis (data informs decisions but doesn’t demand immediate action). Mixing the two creates fatigue: dashboards full of pageable signals get ignored, pages that should have been dashboards burn out the on-call.

Alert: customer impact + action + urgency. All three required; the test for paging.
Dashboard: trends, aggregates, awareness. Informs decisions; doesn’t demand action now.
Mixing creates fatigue. Dashboards full of pageable signals get ignored; over-paging burns out.
Per-signal placement decision. Each signal lands in alert or dashboard; supports clear ownership.

Strict criteria for alerts

Three criteria must all hold for a signal to be an alert. Customer impact (real or imminent; signals with no customer connection like CPU at 80% are dashboards, not pages); action exists (an alert without a runbook is a notification of helplessness, find an action or move it to a dashboard); time-sensitive (if the action can wait until business hours, the alert can wait).

Customer impact required. No customer connection means dashboard, not page.
Action exists. Alert without runbook is notification of helplessness; find an action or move.
Time-sensitive. If action can wait until business hours, alert can wait.
Per-criterion check. All three required; the discipline lives in the trio.

Dashboard criteria

Three categories belong on dashboards. Trends and aggregates (week-over-week, month-over-month, capacity planning, SLO burn-down); operational awareness (on-call checks at start of shift, during incidents dashboards inform but don’t drive paging); stakeholder reports (business metrics, customer counts, revenue, audience is decision makers).

Trends and aggregates. WoW, MoM metrics; capacity planning; SLO burn-down.
Operational awareness. Start-of-shift check; during-incident inform.
Stakeholder reports. Business metrics, customer counts, revenue; audience is decision makers.
Per-dashboard owner. Each dashboard has an owner team; supports continued curation.

Converting between them

Conversion goes both ways. Frequently-firing alerts that operators dismiss without action are dashboard candidates (track per-alert action rate, below 50% means not an alert); dashboard panels that surface real problems people only see in postmortems are alert candidates (convert when the pattern repeats); quarterly review of both directions because each conversion is a small win and the cumulative effect is significant alert quality.

Below 50% action rate. Demote to dashboard; the alert isn’t earning its keep.
Postmortem pattern repeats. Dashboard panel becomes alert candidate.
Quarterly review both directions. Each conversion is a small win; cumulative quality.
Per-conversion documented rationale. Records why the move happened; supports investigation.

Anti-patterns

Three anti-patterns survive too long. Dashboards full of red panels nobody investigates (dashboards are not alerts; visual urgency creates anxiety without action); alerts that exist for reassurance (“alert if too quiet”) without clear meaning (define what “too quiet” means and what to do, or remove); both alert and dashboard for the same signal (pick one based on the action, or ensure they have different audiences and clear ownership).

Red dashboard panels. Dashboards aren’t alerts; visual urgency creates anxiety without action.
Reassurance alerts. “Alert if too quiet”; define meaning or remove.
Same signal both places. Pick one based on action; or differentiate audiences and ownership.
Per-anti-pattern lint. CI catches the common cases; the discipline lives in the linter.