Alerting on Derivatives, Not Absolutes
Some alerts work better on rate of change than on absolute value. The pattern, the metric examples, and when to use each.
When derivative wins
Alerting on derivatives is the discipline of alerting on the rate of change rather than (or in addition to) the absolute value. Some failure modes show up clearly in derivatives but slowly in absolutes; the derivative-based alert catches them earlier.
What favors derivatives:
- Disk usage.: "Disk is at 80%" is a static threshold; the alert fires when usage crosses 80. "Disk is filling at 5 GB per hour" is a derivative; the alert fires when the rate of fill is high.
- Static threshold is late.: By the time the disk reaches 80%, the team has limited time. The fill rate gives earlier warning; the team responds before the situation is critical.
- Memory leaks.: A memory leak shows up gradually in absolute memory. The absolute might never hit a threshold during a particular shift. The derivative (memory growth rate) catches the leak earlier; the team investigates while the leak is still small.
- Absolute memory crosses thresholds slowly.: By the time absolute memory triggers, the leak has been happening for a while. The derivative-based alert produces faster detection.
- Catches the leak earlier.: Earlier detection means earlier remediation. The team has more time to investigate; the customer impact is smaller; the resolution is calmer.
Derivative alerts are leading indicators. They produce earlier warning at the cost of more careful tuning.
When absolute wins
Some metrics warrant absolute thresholds. SLO compliance, hard capacity caps, and similar policy-driven metrics need absolute alerting; the rate of change is secondary.
- SLO compliance.: Latency must be under X. The threshold is the policy; the alert fires when the policy is violated. The rate of change is interesting but does not directly indicate SLO violation.
- Latency must be under X.: The team committed to a specific number. Above the number violates the commitment; the alert is on the violation directly.
- Threshold matters; rate of change is secondary.: The user impact is at the threshold, not at the change. Absolute alerts are the right primitive for these cases.
- Capacity utilization that has a hard cap.: Some resources have hard caps. License count limits, quota limits, fixed-size capacity. Approaching the cap is what matters; the rate of approach is secondary to the absolute level.
- Approach the cap is what matters.: When the cap is hard, the team needs to know they are close to it. The absolute level relative to the cap is the right signal; the derivative is supplementary.
Absolute alerts are right when policy or hard limits dictate behavior. The threshold is the meaningful event.
Combine
The right approach is often both. Alert on the absolute when the threshold matters; alert on the derivative when the rate matters; combine them when both are relevant.
- Often both.: A disk that fills at unusual rate AND is approaching capacity is a stronger signal than either alone. The combination produces precise alerting.
- Alert on either the absolute or the derivative.: Two alerts, ORed together. Either condition fires the alert; the team is notified when either signal warrants attention.
- Two alerts, ORed.: The semantic is "alert me if either of these is concerning". The combination catches more cases than either alone; the false-positive rate is approximately the sum.
- The cheaper safety mechanism.: The combination is the cheap way to add safety. Implementing both alerts is little extra work; the additional coverage is real.
- Tune them together.: The team tunes both alerts as part of the same review. The thresholds match; the false-positive rates are balanced; the alerting strategy is coherent.
Alerting on derivatives is one of those alerting disciplines that produces earlier detection for specific failure modes. Nova AI Ops integrates with metric data, supports both absolute and derivative alerting, and helps teams choose the right primitives for each metric.