Anomaly Detection vs Static Thresholds
Static thresholds are simple and lying. Anomaly detection is correct and noisy. Where each works and how to combine them.
When static wins
Anomaly detection and static thresholds both produce alerts but they answer different questions. Static thresholds answer "is this above the line we said is acceptable?". Anomaly detection answers "is this different from how this workload normally behaves?". Each fits different situations; mature alerting uses both.
What favors static thresholds:
- Hard SLAs.: When the team has a contractual or policy commitment ("response time must be under 200 ms"), the threshold is the policy. The alert fires when the SLA is at risk; the line is non-negotiable.
- The threshold IS the policy.: The alert and the policy are the same thing. There is no "this is normal for this workload"; the policy applies regardless of workload behavior.
- Predictable workloads.: Workloads with consistent baseline benefit from static thresholds. The baseline does not change; the threshold remains valid; tuning is minimal.
- Cost: tuning required as workloads shift.: Static thresholds rot. As the workload changes (organic growth, feature changes, traffic patterns shift), the threshold may become inappropriate. Without periodic tuning, the threshold either fires too often or not enough.
- The threshold rots without maintenance.: The maintenance is real but bounded. Quarterly threshold review prevents rot; without review, thresholds drift away from useful.
Static thresholds are right for clear policy-driven alerts. The simplicity matches the use case.
When anomaly wins
Anomaly detection learns the workload's normal pattern and alerts on deviations. The pattern fits workloads with strong daily, weekly, or seasonal patterns; static thresholds cannot accommodate the variation.
- Workloads with strong daily/weekly patterns.: Customer-facing workloads with peak hours; batch workloads with daily run windows; e-commerce with weekend patterns. The patterns are real; alerting must accommodate them.
- Static thresholds either fire all night or miss daytime issues.: A static threshold set for daytime peak fires constantly during the off-peak hours. A threshold set for off-peak misses real daytime issues. Neither configuration works.
- Anomaly detection learns the patterns.: The detection observes the workload over weeks; it builds a model of normal behavior including the daily and weekly patterns. The alerts fire when the workload deviates from the learned normal.
- Cost: noise during pattern changes.: Holidays, marketing launches, traffic shifts produce alerts because they are deviations from learned normal. Even legitimate changes produce noise; the team tunes or overrides during these periods.
- Tune or override during these.: The team predicts pattern-change events and adjusts the alerting. Alert suppression during planned campaigns; widened thresholds during launches; learning periods after deployments. The maintenance is part of the discipline.
Anomaly detection is right for workloads with strong patterns. The detection captures the patterns; the alerts catch real anomalies.
Combine
The two approaches are complementary. Each catches different things; the combination catches everything either alone would catch.
- Anomaly detection for novelty.: Anomaly detection catches "this is different from normal". The signal is about novelty; the alert is about something the system has not seen before.
- Static threshold for absolutes.: Static thresholds catch "this is past the acceptable line". The signal is about absolutes; the alert is about policy violations regardless of workload patterns.
- Both fire; either alone is incomplete.: A workload that is normal for itself might still be past the acceptable line; a workload that is fine on absolute basis might be unusually behaving. Both signals matter.
- Most mature stacks have both layered.: The combination is what mature observability looks like. Static thresholds enforce policies; anomaly detection catches novelty; the union covers more failure modes than either alone.
- Tune them together.: The team tunes both layers as part of the same review. The static thresholds and the anomaly models are kept current together. The maintenance is integrated.
Anomaly detection vs threshold is a both-not-either question. Nova AI Ops integrates with both alerting paradigms, surfaces patterns from each, and produces the layered alerting that mature observability needs.