Alerts Practical By Samson Tanimawo, PhD Published Apr 17, 2026 4 min read

Anomaly Detection vs Static Thresholds

Two alert approaches. Decision by workload pattern.

Where static thresholds win

SLA values: 99.9% availability, 200ms p99 latency, 5% error rate. These are contractual numbers; the threshold is the contract.

Stable workloads. If traffic is predictable within 20%, a static threshold catches real outliers.

Cost: cheap to define, cheap to debug. The on-call understands what triggered the alert without reading ML output.

Where anomaly detection wins

Seasonal traffic. E-commerce during holidays, payroll on the 15th, weekday vs weekend patterns.

Per-tenant or per-region traffic. A static threshold for the global metric misses tenant-specific outages.

Cardinality-heavy metrics where setting per-series thresholds is impractical.

The trade-off

Anomaly detection has higher false positive rates by default. Tune sensitivity carefully or you trade noisy static alerts for noisy ML alerts.

Anomaly alerts are harder to debug. "Why did this fire?" needs the model output, not just a number.

Tooling lock-in. Datadog Watchdog, Prometheus's MAD, GCP MQL. Switching tools means rewriting alerts.

Hybrid is usually right

Static thresholds on contractual SLAs and known dangerous values (disk > 90%, queue > 10k).

Anomaly detection on traffic-shape metrics where the normal range varies hourly or seasonally.

Don't replace static alerts wholesale. The static alerts that work are not the problem.

How to pick per metric

Is there a known dangerous value (SLA, capacity limit). Static threshold.

Does the metric have strong seasonality. Anomaly detection with seasonality model.

Does the metric vary by tenant. Per-tenant baselines via anomaly detection.