AI Agent Operations

Run the runbook in a digital twin first,
see what would happen, then decide

Simulation Engine runs a runbook plan against a digital twin of your environment. It simulates the effect at each step, predicts the SLI movement, and reports the expected outcome. Use it to validate that a remediation will actually fix the SLI, or to catch a runbook that would have broken something the planner did not consider.

Get Started Talk to Sales
app.novaaiops.com / simulation-engine
● LIVE
Step-by-step
simulation
SLI
predicted per step
Cheaper
than canary deploy
Replays
from real recent state
How the Twin Works

Live state, frozen for the run

The digital twin is a snapshot of your real environment: service graph, current load, current SLIs, current resource utilization. The simulation engine applies the planned runbook step by step against the twin and recomputes predicted SLIs after each step. The twin is a snapshot, it does not affect production. The simulation runs in seconds.

  • Snapshot of real state: the twin captures live service graph, load, SLIs, and resources at simulation start
  • Step-by-step apply: each runbook step is applied to the twin and predicted SLIs recomputed
  • No production effect: the twin is read-only against your real environment; nothing changes outside the twin
app.novaaiops.com / simulation-engine · twin
Predicted SLIs

See the SLO impact before you act

After each runbook step, the engine predicts the SLI values: p95 latency, error rate, saturation, custom SLIs. The prediction uses the same models that drive Predictive Detection. A simulation that predicts an SLO breach is a flagged simulation; the engine recommends adjusting the plan before proceeding.

  • Per-step SLI prediction: every step shows the predicted SLI movement, not just the final state
  • SLO-aware: predictions compared against your SLO targets; breaches flagged
  • Recommended adjustment: when a step would breach an SLO, the engine recommends an alternative (smaller scale, slower ramp)
app.novaaiops.com / simulation-engine · sli
When to Simulate

Big changes, ambiguous changes, recovery plans

Simulate when the change is big (scale operations, mass restarts, schema migrations), when the change is ambiguous (when multiple agents disagreed in debate), or when the change is a recovery plan during an active incident. For routine work, simulation is overkill, the engine knows when to suggest itself.

  • Big changes: scale ops, mass restarts, migrations, high blast radius warrants a sim
  • Ambiguous changes: when debate or arbiter could not converge, sim is the tiebreaker
  • Recovery during incidents: high-stakes recovery plans benefit from a sim before the operator approves
app.novaaiops.com / simulation-engine · when
Calibration

How accurate are the predictions

For every simulated runbook that actually runs, the engine compares predicted vs actual SLI movement and reports calibration. Good calibration (predictions within ±10%) builds trust; poor calibration triggers model review. The engine's precision is itself a meta-SLI on Service Health Matrix.

  • Predicted vs actual: every real run with a prior simulation generates a calibration data point
  • ±10% target: predictions inside ±10% are considered well-calibrated; outside triggers review
  • Meta-SLI: the engine has its own SLO on Service Health Matrix; it is held to the same standard
app.novaaiops.com / simulation-engine · calibration
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Try it in the twin first

Simulation is a 10-second check that catches "actually, this would break payments" before the real deploy.

Get Started Request a Demo