Alerts Practical By Samson Tanimawo, PhD Published Apr 29, 2026 4 min read

The Alert Rate Limit Pattern

Some alerts can flood. Rate limit them.

Why rate limiting alerts matters

A failing dependency can fire 10,000 alerts in an hour. Each is identical. The pager floods and real signal is lost.

Rate limit at the alerting layer: cap the number of alert events per group per hour.

This is not deduplication. Dedup collapses identical alerts; rate limiting caps even non-identical ones from the same source.

How to implement

Alertmanager: group_interval and repeat_interval per route. Set repeat_interval to 4h for non-critical, 1h for critical.

PagerDuty: event rules with rate-limit actions. Drop or downgrade events above N/hour.

OpsGenie: notification policies with frequency caps.

What to rate-limit by

By alert name + service + region. Not by alert name alone; that hides regional outages.

By owner team. A team should not receive more than 10 distinct pages per hour. Above that, escalate to the team lead.

By integration source. If a webhook starts spamming, cap the source before it floods downstream.

When rate limiting hides outages

A rate limit that drops alerts silently is dangerous. The outage is happening; the page count is fake.

Always log dropped alerts. Add a meta-alert if drops exceed N/hour.

Prefer suppression with a marker ("5,000 similar alerts suppressed") over hard drops.

Default settings

repeat_interval = 1h for sev1, 4h for sev2, off for sev3. Stops oscillation without losing visibility.

Per-team cap = 10 distinct alerts per hour. Above that, escalate or batch.

Test by simulating an outage. Fire 1,000 alerts in 5 minutes against staging and confirm only the bounded number reaches paging.