AI Safety & Governance

A big red button you can actually press,
stop one agent, one tenant, or all of them

Kill Switch is the panic stop for the AI fleet. One agent acting weird? Pause it. One tenant misbehaving? Quarantine it. Whole platform looking off? Halt everything. Sub-second propagation, three scopes, fully audited. Designed to be safe to press.

Get Started Talk to Sales
app.novaaiops.com / kill-switch
● LIVE
< 1s
Propagation to fleet
3
Scopes (agent, tenant, global)
Read
Stays read-only on kill
Logged
Every press, with reason
Three Scopes

Match the blast radius to the problem

Different incidents need different scopes. A misbehaving agent gets paused at the agent scope (everyone else keeps working). A bad tenant deploy gets quarantined at the tenant scope. A platform-wide regression gets the global kill. The button only does what its scope is configured for, so you cannot accidentally halt the world.

  • Agent scope: pause one agent across all tenants, used for "this agent is generating noisy false positives"
  • Tenant scope: pause every agent for one tenant, used during a tenant-side incident or maintenance
  • Global scope: pause every agent for everyone, used during platform incidents or model regressions
app.novaaiops.com / kill-switch · scopes
Read-Only on Kill

Stopping is safe, never destructive

When you press kill, every running agent transitions to read-only mode. They can still observe, log, and reason, they just cannot execute any tool that mutates state. In-flight tool calls finish or roll back to a checkpoint. Nothing that was healthy becomes unhealthy because you pressed kill.

  • In-flight checkpointing: long-running tool calls roll back to the last checkpoint, not their inception
  • Observability stays on: agents keep producing diagnostics so you can see why you killed them
  • No data loss: kill never drops queued signals, they wait in the queue for the un-kill
app.novaaiops.com / kill-switch · safety
When to Press It

Three patterns we see in practice

Patterns where teams reach for the kill switch: (1) a model upgrade from your provider produced regressions and the fleet is over-acting, (2) you are running a chaos game day and want to take humans-only for an hour, (3) a downstream provider (cloud, monitoring, paging) is degraded and you want to stop acting on stale data.

  • Pattern 1: model regression: press global kill the moment success rate drops below your threshold, hold until you re-pin a model
  • Pattern 2: game day: press tenant kill on the staging tenant, run the game day, release when done
  • Pattern 3: data plane degraded: press global kill, wait for the upstream to recover, release, the agents resume from a clean signal
app.novaaiops.com / kill-switch · runbook
Audit & Recovery

Every press leaves a paper trail

Pressing kill writes a row to Agent Ledger with the operator id, the scope, the optional reason, and the duration. Releasing kill records the warm-restart sequence. Use the report view to see how often you are pressing each scope and whether you are converging on stability or hitting the same wall every week.

  • Required reason on global: global kill requires a free-text reason so the post-incident review has the why
  • Warm-restart on release: agents come back with a 60s ramp so you do not slam the data plane on un-kill
  • Weekly trend report: kill count per scope, mean dwell time, top reasons, emailed to platform-admin
app.novaaiops.com / kill-switch · log
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Confidence comes from being able to stop

Adopting AI for ops feels safer when you know exactly how to halt it. Kill Switch is that lever, and pressing it never breaks anything.

Get Started Request a Demo