Pull vs Push Alerting
Alert source pulls vs alert source pushes. Trade-offs.
The distinction
Pull alerting polls for state; push alerting receives events. Prometheus scrapes targets every 15 seconds and evaluates rules per tick; Datadog agents push metrics, Sentry pushes errors, CloudWatch alarms fire on push. Most teams run a mix, and picking which one for which signal is the actual question.
- Pull alerting. System polls for state; Prometheus scrapes targets every 15 seconds; alert evaluator runs the rule each tick.
- Push alerting. System receives events; Datadog agents push metrics, Sentry pushes errors, CloudWatch alarms fire on push.
- Hybrid is the norm. Most teams run both; the question is which model for which signal.
- Per-signal selection. The right model is signal-shape dependent; standardising on one model produces the wrong answer for the other half.
When pull wins
Pull wins when the infrastructure is predictable and the team controls the scrape target. Services exposing /metrics endpoints, databases with exporters, Kubernetes-hosted internal infrastructure; anywhere centralised control over scrape intervals matters and the target can answer a poll.
- Predictable infrastructure. Services exposing /metrics endpoints, databases with exporters, hosts with node-exporter.
- Centralised scrape control. Prometheus operators handle scrape intervals well; one place to tune.
- Kubernetes-hosted services. Internal infrastructure where the team controls the target; the natural fit for pull.
- Per-target /metrics convention. The target exposes a known endpoint; the convention transfers across services.
When push wins
Push wins when the events are bursty, the producer is third-party, or the producer is intermittently online. Pull would miss bursty events between scrapes; SaaS webhooks cannot expose a Prometheus endpoint; mobile and edge clients may be offline at scrape time but online to push later.
- Bursty events. Errors, deploys, audit logs; pull would miss them between scrapes.
- Third-party services. SaaS products that send webhooks; you cannot make them expose a Prometheus endpoint.
- Mobile and edge. Clients that may be offline at scrape time but online to push later.
- Per-event traceability. Push events carry full context at the source; useful when the event itself is the signal.
Operational differences
The two models break differently. Pull is easier to debug because the last-scrape timestamp is visible; push is easier to scale because the producer controls rate. Pull breaks if the target hides behind NAT; push breaks if the consumer’s queue is full.
- Debuggability. Pull is easier to debug; the last-scrape timestamp is visible, missing data is obvious.
- Scalability. Push is easier to scale; the producer controls rate, the consumer absorbs.
- Network failure modes. Pull breaks if the target hides behind NAT; push breaks if the consumer’s queue is full.
- Per-failure runbook. The two models need different runbooks; the on-call needs both for hybrid stacks.
Pick by signal type
The right pick is signal-driven. Steady-state metrics and infrastructure health are pull; events, errors, and audit logs are push; hybrid stacks are the norm and trying to standardise on one model produces the wrong answer for the other half.
- Steady-state metrics. Infrastructure health; pull (Prometheus) is the natural fit.
- Events and errors. Audit logs; push (Datadog Events, Sentry, OpenTelemetry) is the natural fit.
- Hybrid stacks norm. Don’t try to standardise on one model; signal shape drives the choice.
- Per-signal documented choice. The signal-to-model mapping committed to the team handbook; supports consistency across services.