Pre-Prod Alert Noise

Pre-prod alerts shouldn't page production on-call.

Where pre-prod noise comes from

Pre-prod noise has predictable sources. Staging clusters reuse production alert configs and fire on every test, chaos run, and flaky deploy; pre-prod has fewer humans, so the page rate per engineer is often higher than production; pre-prod alerts are often misrouted to the production rotation, paging the on-call for a staging issue at 2am.

Separate paging for pre-prod

Pre-prod paging needs separation from production. Dedicated channel rather than production on-call rotation; Slack-only for sev2 and below; pre-prod sev1 still pages but to the team’s business-hours rotation, not the 24/7 on-call. Every alert tagged with environment for routing.

Mute during known events

Known events should mute pre-prod alerts. CI runs, chaos drills, and performance tests should mute alerts on affected services for the duration; a maintenance-mode API gives CI a hook to call before a destructive test and end after. Without muting, the team learns to ignore alerts, and that habit carries to production.

Pre-prod gets a noise budget too

Pre-prod alert volume deserves a budget. 10-20% of production volume is the target; higher means configs are over-noisy or staging itself is broken. Pre-prod page count above production count is a red flag worth investigating the same week. Review pre-prod alerts on the same quarterly cadence as production.

How to fix pre-prod noise

Fixing pre-prod noise is concrete work. Environment label on every alert with routing that doesn’t page production on-call; muting hooks in CI for chaos and load tests; removing or downgrading pre-prod-only alerts because production alerts should not run in staging without modification.