Business-Impact-Tagged Alerts
Tag alerts with business impact for prioritisation.
Why tag with business impact
Business-impact tags let one alert serve multiple audiences. Engineers need technical context; executives need business context; tags like “revenue-impacting”, “compliance-critical”, and “customer-facing” let the right humans get the right view automatically and let routing match the stakes.
- Two audiences, one alert. Engineers need technical context; executives need business context; tags route both correctly.
- Common tags. revenue-impacting, compliance-critical, customer-facing; the human-readable business framing.
- Routing matches stakes. Without tags every alert escalates the same way; with tags the routing matches the stakes.
- Per-service tagging discipline. Tags live with the service definition; every new service inherits the convention.
The tag set
Keep the tag set small. 5-7 tags maximum: revenue, compliance, data-loss, security, reputation, internal. Apply at the service level rather than per-alert; every alert on the checkout service is revenue-impacting, so tag once per service. Maintain in the service catalog so drift is visible.
- 5-7 tags maximum. revenue, compliance, data-loss, security, reputation, internal; small enough to remember.
- Service-level application. Every alert on the checkout service is revenue-impacting; tag once per service.
- Service catalog as source. Backstage, OpsLevel; drift from there into PagerDuty and the alerting system.
- Per-tag definition committed. What “revenue-impacting” means is documented; supports consistent application.
Routing by tag
Routing by tag matches the audience to the impact. Revenue-impacting and compliance-critical page the on-call and notify a leadership-included Slack channel; internal-only pages on-call only; data-loss pages on-call, infrastructure leads, and the data team because three audiences need one alert.
- Revenue-impacting and compliance-critical. Page on-call plus a Slack channel that includes leadership.
- Internal-only. Page on-call only; no leadership notification.
- Data-loss. Page on-call, infrastructure leads, and the data team; three audiences for one alert.
- Per-tag routing rule. The routing rule per tag committed to the alertmanager config; supports investigation.
Dashboards by tag
Tag-filtered dashboards make the operational view tractable. A “revenue-impacting open incidents” dashboard filters the incident list by tag; MTTR-per-tag tracking surfaces priority misalignment because revenue alerts should resolve faster than internal ones; the weekly ops review shows the tag distribution.
- Tag-filtered incident list. Build a “revenue-impacting open incidents” dashboard; filter by tag.
- MTTR per tag. Revenue alerts should resolve faster than internal ones; if not, priorities are wrong.
- Weekly ops review distribution. The tag mix tells you where attention is going.
- Per-tag baseline. Each tag’s normal volume documented; supports anomaly detection on the mix itself.
Get started
Start small and iterate. Define the tag set in 30 minutes and don’t over-engineer; tag the top 20 services this week and push tags into PagerDuty as service properties; build one tag-filtered dashboard because the first dashboard sells the rest.
- Define tags in 30 minutes. Don’t over-engineer; you can add tags later.
- Tag top 20 services this week. Push tags into PagerDuty as service properties.
- Build one dashboard first. The first dashboard sells the rest; the visible value drives adoption.
- Per-quarter tag-coverage check. Untagged services flagged; supports continuous coverage growth.