ML-based detection of unusual patterns in metrics, logs, or traces that static thresholds cannot catch.
Anomaly detection in observability is the practice of identifying patterns in telemetry that deviate from a learned baseline. Where a static threshold says 'page me when CPU > 90%', anomaly detection says 'page me when this service's pattern looks unlike its last 30 days'. The advantage is context-awareness: a 5x latency spike during a known deploy is not the same as a 5x spike at 3am. Modern systems combine seasonal models, deploy correlation, and topology awareness to keep false-positive rates low.
Most outages don't start as a metric crossing a threshold, they start as a subtle change in pattern that a threshold-based monitor cannot see. Anomaly detection catches those early and reduces false alerts during expected behavior changes (deploys, traffic peaks, scheduled maintenance).
See the part of the platform that handles anomaly detection in production.