AI-Powered Response

Runbooks that
execute themselves

Nova's AI Runbook engine generates response playbooks on demand, covering every severity from SEV-4 degradations to SEV-1 global outages. When an incident fires, Nova selects the matching runbook, simulates the blast radius, and executes the response, with rollback steps already prepared before a human ever touches the keyboard.

Get Started Talk to Sales

app.novaaiops.com · AI Runbooks

● LIVE

AI Runbook library · 8 active playbooks

RB-01Latency spike › rollback deployused 24xauto

RB-02OOM pod restart › heap dumpused 18xauto

RB-03SSL cert rotationused 11xapprove

RB-04Traffic saturation › scale HPAused 42xauto

RB-05DB pool exhaustion › recycleused 9xauto

RB-06CDN purge › edge revalidateused 6xapprove

RB-07Region failover › route53used 2xmanual

RB-08Partial deploy rollbackused 15xauto

Scenario Runner

AI picks the right runbook. You approve. It executes.

Nova ships with pre-built scenario runbooks for the most common production failures: latency spikes, partial outages, memory pressure events, SSL failures, and traffic saturation. When an incident fires, the AI engine classifies the failure type, selects the matching runbook, and presents it for one-click execution, with every step, expected outcome, and rollback procedure laid out before you approve.

✓
Latency spike runbooks: isolate slow services, roll back recent deploys, scale affected pods, and verify recovery automatically
✓
Memory pressure playbooks: heap dump collection, OOM analysis, pod restart sequencing with traffic drain built in
✓
SSL failure remediation: cert rotation, CDN cache purge, and health check validation executed in the correct order every time

app.novaaiops.com · Scenario Runner

RB-01 · Latency spike · executing step 3 / 5

▸ STEP 3: roll back deploy v2.14.3

$ kubectl rollout undo deployment/checkout -n prod
deployment.apps/checkout rolled back

00:04detect p95 > 800ms on /checkout

00:09classify: latency-spike

00:12select RB-01 · confidence 94%

00:18rollback in flight …

--verify /health (pending)

What-If Simulation

Simulate blast radius before it happens: not during the outage.

The What-If engine lets you simulate failure modes against your live service graph before any real incident occurs. Run SEV-1 Global, SEV-2 Regional, Slow Burn, or Cascade scenarios to see exactly which services go down, in what order, and what the estimated user impact and revenue exposure looks like, so your team knows the playbook before 3 AM.

✓
SEV-1 Global simulation: model complete region failure; identify which services have no failover path and fix them proactively
✓
Cascade failure modeling: inject a single service failure and watch the dependency graph propagate to find unexpected blast radius
✓
Slow burn scenarios: simulate gradual degradation over hours to test whether your alerting catches it before SLO breach

app.novaaiops.com · What-If Engine

What-If · SEV-1 Global · us-east-1 down

Affected

17 svc

Users

2.1M

Rev/hr

$412K

PREDICTED CASCADE

T+0slb-us-east fails health

T+12scheckout › degraded

T+30sorders queue backpressure

T+58sroute53 flips to us-west-2

T+90srecovery ETA reached

Impact Analysis

Know which downstream systems are at risk before you touch anything.

Every runbook execution is preceded by a live impact analysis step. Nova walks your service dependency graph and identifies every downstream system that could be affected by the proposed remediation action, before a single command is run. Engineers see the complete risk surface, not just the immediate fix target, so no one accidentally causes a cascade while resolving the original incident.

✓
Pre-action dependency scan: dependency graph traversal runs before every runbook step, flagging at-risk downstream services
✓
Risk severity scoring: each at-risk service is scored by criticality and user impact to help engineers prioritize which risks to accept
✓
Blast radius visualization: interactive graph showing exactly which services, teams, and SLOs are in the impact zone of each action

app.novaaiops.com · Impact Analysis

Pre-action impact · blast radius

Restart payment-service pods

Initiated by runbook RB-02 · severity HIGH

DOWNSTREAM AT RISK (4)

checkout-api30% trafficmedium

receipts-workerasynclow

fraud-scorersynchigh

notification-svcfire-forgetlow

Write in Plain English

Describe a fix in plain English. Nova writes the runbook: with rollback.

You shouldn't need to be a YAML expert to encode operational knowledge. Describe what you want to happen in plain English, "restart the payment service pods if memory exceeds 85%, then verify the health endpoint responds before bringing traffic back", and Nova converts it into a fully structured, executable runbook with rollback steps, success criteria, and notification hooks included.

✓
Natural language authoring: describe the fix in everyday English; Nova generates the structured runbook with all steps, conditions, and checks
✓
Automatic rollback generation: every AI-authored runbook includes a rollback procedure that undoes the remediation if health checks fail post-execution
✓
Team knowledge capture: turn tribal knowledge from your senior engineers into searchable, executable runbooks that the whole team can use

app.novaaiops.com · Runbook Authoring

Author in plain English

YOU SAID

"restart payment-service pods if memory > 85%, then verify /health before re-routing traffic"

NOVA GENERATED · 4 steps + rollback

1when mem_pct > 85 ON payment-service

2drain traffic via lb.weight=0

3kubectl rollout restart deployment/payment

4wait for /health = 200 (max 60s)

rbrollback: lb.weight=1 & page on-call

Runbooks thatexecute themselves

AI Runbook library · 8 active playbooks

AI picks the right runbook. You approve. It executes.

RB-01 · Latency spike · executing step 3 / 5

Simulate blast radius before it happens: not during the outage.

What-If · SEV-1 Global · us-east-1 down

Know which downstream systems are at risk before you touch anything.

Pre-action impact · blast radius

Describe a fix in plain English. Nova writes the runbook: with rollback.

Author in plain English

Goes great with AI Runbooks

Incident Hub

Autonomous Remediation

AI Agent Fleet

Your next incident is already covered

Runbooks that
execute themselves