PagerDuty Routing Rules: The Hard Cases
Routing alerts to the right team. The hard cases and the patterns.
The easy cases
The easy cases work natively in PagerDuty. Service to team, severity to escalation policy, business hours vs after hours; PagerDuty handles these with event rules. Most teams stop here because the catalog is small and the rules are clear; trouble starts when business and topology force exceptions.
- Service to team. Native PagerDuty event rule; the simplest mapping.
- Severity to escalation policy. Sev1 to one policy, sev2 to another; native support.
- Business hours vs after hours. Schedule-based routing; native support.
- Stop here when possible. Small catalog, clear rules; keep it that way as long as possible.
Cross-team services
Cross-team services break the simple model. Service owned by team A but rules into a feature owned by team B; a page on the feature should hit team B, a page on the platform should hit team A. Use PagerDuty event orchestration with custom fields where the alert payload tags the feature and rules route accordingly; document the routing decision in the runbook.
- Feature vs platform routing. Feature page hits team B; platform page hits team A; same service.
- Event orchestration with custom fields. Alert payload tags the feature; rules route accordingly.
- Documented routing in runbook. Team B’s on-call shouldn’t have to ask team A’s history.
- Per-service routing decision. Each cross-team service has a documented routing rationale; supports investigation.
Time-of-day routing
Time-of-day routing supports follow-the-sun coverage. Pages route to APAC overnight, EU during European day, US during American day; PagerDuty schedules support this and event orchestration can override per service; test the boundaries because the 06:00 UTC handover is where routing bugs live.
- Follow-the-sun coverage. APAC overnight, EU European day, US American day.
- PagerDuty schedules support. Event orchestration can override per service.
- Test the boundaries. 06:00 UTC handover is where routing bugs live.
- Synthetic page at boundary. Verifies routing right at the handover; supports correct delivery.
Vendor and third-party pages
Vendor pages are signals, not pages for most services. AWS Health, Cloudflare incidents, GitHub status; route to a Slack channel by default and page only if the affected service is tier 1; use Statuspage’s API to fan in vendor incidents to your alerting backbone and then apply your own routing logic.
- Vendor sources. AWS Health, Cloudflare incidents, GitHub status; signals for most services.
- Slack default, tier-1 page. Route to Slack channel by default; page only if affected service is tier 1.
- Statuspage API fan-in. Vendor incidents into the alerting backbone; apply your own routing.
- Per-vendor routing rule. Each vendor source has a documented route; supports correct triage.
Apply this quarter
The application is concrete. Audit your event rules because anything older than 6 months without a recent edit is suspect; document each non-trivial rule in a comment field for the next person to touch it; run a synthetic page test monthly because routing breaks silently and only synthetic tests catch it.
- 6-month staleness audit. Anything older without a recent edit is suspect; the audit catches drift.
- Per-rule comment field. Non-trivial rules carry context for the next person to touch them.
- Monthly synthetic page test. Routing breaks silently; only synthetic tests catch it.
- Per-quarter routing review. Continued attention; supports correct delivery.