AI Agent Operations

Run the runbook in a digital twin first,
see what would happen, then decide

Simulation Engine runs a runbook plan against a digital twin of your environment. It simulates the effect at each step, predicts the SLI movement, and reports the expected outcome. Use it to validate that a remediation will actually fix the SLI, or to catch a runbook that would have broken something the planner did not consider.

Get Started Talk to Sales

app.novaaiops.com / simulation-engine

● LIVE

Simulation · scale-down payments-api

step 1cordon nodes · ok

step 2drain pods · ok

step 3scale 8 → 4 · ok

predicted p95142 → 240ms (above 200 SLO)

verdictdo not proceed · scale to 6 instead

How the Twin Works

Live state, frozen for the run

The digital twin is a snapshot of your real environment: service graph, current load, current SLIs, current resource utilization. The simulation engine applies the planned runbook step by step against the twin and recomputes predicted SLIs after each step. The twin is a snapshot, it does not affect production. The simulation runs in seconds.

✓
Snapshot of real state: the twin captures live service graph, load, SLIs, and resources at simulation start
✓
Step-by-step apply: each runbook step is applied to the twin and predicted SLIs recomputed
✓
No production effect: the twin is read-only against your real environment; nothing changes outside the twin

app.novaaiops.com / simulation-engine · twin

Twin state

captured at14:42 (live snapshot)

services96 (full graph)

current load22.4k rps

twin lifetime5 minutes (per run)

Predicted SLIs

See the SLO impact before you act

After each runbook step, the engine predicts the SLI values: p95 latency, error rate, saturation, custom SLIs. The prediction uses the same models that drive Predictive Detection. A simulation that predicts an SLO breach is a flagged simulation; the engine recommends adjusting the plan before proceeding.

✓
Per-step SLI prediction: every step shows the predicted SLI movement, not just the final state
✓
SLO-aware: predictions compared against your SLO targets; breaches flagged
✓
Recommended adjustment: when a step would breach an SLO, the engine recommends an alternative (smaller scale, slower ramp)

app.novaaiops.com / simulation-engine · sli

SLI prediction · per step

after step 1p95 142 (no change)

after step 2p95 158 (within SLO)

after step 3p95 240 (over SLO)

recommendationscale to 6 instead of 4

When to Simulate

Big changes, ambiguous changes, recovery plans

Simulate when the change is big (scale operations, mass restarts, schema migrations), when the change is ambiguous (when multiple agents disagreed in debate), or when the change is a recovery plan during an active incident. For routine work, simulation is overkill, the engine knows when to suggest itself.

✓
Big changes: scale ops, mass restarts, migrations, high blast radius warrants a sim
✓
Ambiguous changes: when debate or arbiter could not converge, sim is the tiebreaker
✓
Recovery during incidents: high-stakes recovery plans benefit from a sim before the operator approves

app.novaaiops.com / simulation-engine · when

Auto-suggested when

blast > 10% MAUsimulate

debate did not convergesimulate

during sev-1simulate

routine restartskip · low value

Calibration

How accurate are the predictions

For every simulated runbook that actually runs, the engine compares predicted vs actual SLI movement and reports calibration. Good calibration (predictions within ±10%) builds trust; poor calibration triggers model review. The engine's precision is itself a meta-SLI on Service Health Matrix.

✓
Predicted vs actual: every real run with a prior simulation generates a calibration data point
✓
±10% target: predictions inside ±10% are considered well-calibrated; outside triggers review
✓
Meta-SLI: the engine has its own SLO on Service Health Matrix; it is held to the same standard

app.novaaiops.com / simulation-engine · calibration

Calibration · this month

simulations vs actual42 paired

within ±10%38 (90%)

over-predict2 (less than actual)

under-predict2 (worse than actual · review)

Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Try it in the twin first

Simulation is a 10-second check that catches "actually, this would break payments" before the real deploy.

Get Started Request a Demo

Run the runbook in a digital twin first,see what would happen, then decide