Business-Impact-Tagged Alerts

Tag alerts with business impact for prioritisation.

Why tag with business impact

Business-impact tags let one alert serve multiple audiences. Engineers need technical context; executives need business context; tags like “revenue-impacting”, “compliance-critical”, and “customer-facing” let the right humans get the right view automatically and let routing match the stakes.

Two audiences, one alert. Engineers need technical context; executives need business context; tags route both correctly.
Common tags. revenue-impacting, compliance-critical, customer-facing; the human-readable business framing.
Routing matches stakes. Without tags every alert escalates the same way; with tags the routing matches the stakes.
Per-service tagging discipline. Tags live with the service definition; every new service inherits the convention.

The tag set

Keep the tag set small. 5-7 tags maximum: revenue, compliance, data-loss, security, reputation, internal. Apply at the service level rather than per-alert; every alert on the checkout service is revenue-impacting, so tag once per service. Maintain in the service catalog so drift is visible.

5-7 tags maximum. revenue, compliance, data-loss, security, reputation, internal; small enough to remember.
Service-level application. Every alert on the checkout service is revenue-impacting; tag once per service.
Service catalog as source. Backstage, OpsLevel; drift from there into PagerDuty and the alerting system.
Per-tag definition committed. What “revenue-impacting” means is documented; supports consistent application.

Routing by tag

Routing by tag matches the audience to the impact. Revenue-impacting and compliance-critical page the on-call and notify a leadership-included Slack channel; internal-only pages on-call only; data-loss pages on-call, infrastructure leads, and the data team because three audiences need one alert.

Revenue-impacting and compliance-critical. Page on-call plus a Slack channel that includes leadership.
Internal-only. Page on-call only; no leadership notification.
Data-loss. Page on-call, infrastructure leads, and the data team; three audiences for one alert.
Per-tag routing rule. The routing rule per tag committed to the alertmanager config; supports investigation.

Dashboards by tag

Tag-filtered dashboards make the operational view tractable. A “revenue-impacting open incidents” dashboard filters the incident list by tag; MTTR-per-tag tracking surfaces priority misalignment because revenue alerts should resolve faster than internal ones; the weekly ops review shows the tag distribution.

Tag-filtered incident list. Build a “revenue-impacting open incidents” dashboard; filter by tag.
MTTR per tag. Revenue alerts should resolve faster than internal ones; if not, priorities are wrong.
Weekly ops review distribution. The tag mix tells you where attention is going.
Per-tag baseline. Each tag’s normal volume documented; supports anomaly detection on the mix itself.

Get started

Start small and iterate. Define the tag set in 30 minutes and don’t over-engineer; tag the top 20 services this week and push tags into PagerDuty as service properties; build one tag-filtered dashboard because the first dashboard sells the rest.

Define tags in 30 minutes. Don’t over-engineer; you can add tags later.
Tag top 20 services this week. Push tags into PagerDuty as service properties.
Build one dashboard first. The first dashboard sells the rest; the visible value drives adoption.
Per-quarter tag-coverage check. Untagged services flagged; supports continuous coverage growth.