AI Agent Operations

Catch the incident before the page,
an average of four hours of warning

Predictive Detection learns the normal shape of every metric, log volume, and trace pattern in your system. When the shape starts to drift toward a previously-seen incident pattern, the model fires an early-warning signal. Median lead time across customers: 4 hours and 12 minutes before the symptoms hit your alerts.

Get Started Talk to Sales

app.novaaiops.com / predictive-detection

● LIVE

Predictions · last 7 days

apr 22payments p95 drift · 3h 42m before alert · prevented

apr 20cart-checkout error rate climb · 5h 12m before alert · prevented

apr 18memory-leak signature · 2h 8m before alert · ignored, then incident

apr 15connection-pool saturation · 6h 22m before alert · prevented

avg lead time4h 12m

How the Model Works

Per-service baselines plus pattern matching

For each service, Nova learns a per-window baseline of latency, error rate, log volume, and queue depth. A drift score is computed every minute against the baseline. Separately, Nova carries a library of 120+ failure-mode signatures (memory leak, connection-pool saturation, cache stampede, GC death spiral) and matches the live shape against them. A prediction fires when both layers agree.

✓
Layer 1 · baseline drift: per-service rolling baseline with confidence interval; drift = how many sigmas off baseline
✓
Layer 2 · signature match: 120+ named failure-mode signatures (memory leak, deadlock, GC death spiral) matched against the shape
✓
Both must agree: a prediction needs drift > threshold AND a signature match, drives the precision number above

app.novaaiops.com / predictive-detection · model

Live · payments

DRIFT vs BASELINE

drift3.2 sigma

signatureconnection-pool saturation

both agreeprediction · fired

Confidence Thresholds

You set the noise/lead-time tradeoff

Predictions ship with a confidence score (0-100). You set the threshold for "page on-call" vs "post in slack" vs "log only". Default thresholds favor lead time: Slack at 60, page at 80. Tighten them if your team finds the early-warning signal noisy. The threshold is per-service, so tier-0 services can be more sensitive than tier-2.

✓
Three threshold tiers: log-only (40-60), slack-notify (60-80), page-on-call (80-100), defaults shipped, override per service
✓
Per-service tuning: tier-0 services pageon 70, tier-2 pageon 95, match noise tolerance to service criticality
✓
Suppress during maintenance: maintenance windows defined in the on-call page suppress predictions automatically

app.novaaiops.com / predictive-detection · thresholds

Thresholds · payments

log-only≥ 50

slack-notify≥ 65

page-on-call≥ 82

maint windowall suppressed

last 7d page rate2 / week

Drift Detection on the Model

The model itself is monitored

Models drift. Yours did, mine does, ours does. Nova monitors prediction accuracy on a 7-day rolling window. When precision falls below your threshold, the model is flagged for retraining. The previous version stays active during the retrain so coverage never drops. You see the drift before customers do.

✓
Rolling precision window: every prediction has an outcome at T+24h; precision is computed weekly per service
✓
Auto-retrain trigger: precision below threshold flags retrain; previous model stays active until retrain completes
✓
No silent drift: precision drops show up on Service Health Matrix as a meta-SLI on the prediction system itself

app.novaaiops.com / predictive-detection · drift

Model precision · 7d

PRECISION

current precision94.2%

retrain threshold90%

last retrainapr 15 · 3 days under threshold

How to Use the Page

A first-time tour

Open the page. The top panel shows live predictions across all your services, ranked by confidence. Click any prediction to see the drift chart, the matching signature, the suspect changes from Nova Rewind, and the suggested runbook. Approve "auto-remediate" on a prediction to let the agent fleet act before the page fires.

✓
Top panel · live predictions: live ranked list; click any row for the drill-in (drift chart, signature, suspects, runbook)
✓
Auto-remediate toggle: per-prediction toggle that hands the prediction to the agent fleet for action, gated by Approval Queue
✓
History tab: every prediction with its outcome at T+24h, used for tuning thresholds and reviewing precision

app.novaaiops.com / predictive-detection · live

Live · 14:42

payments · pool saturation88

etaincident likely in 3h 12m

checkout · drift on p9572

etaincident likely in 5h 04m

identity · log volume drop52

etawatch only · log-only tier

Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Stop waiting for the page

Predictive Detection turns operations from reactive to proactive. Most of the time, you fix the problem before anyone has a reason to file a ticket.

Get Started Request a Demo

Catch the incident before the page,an average of four hours of warning