AI Agent Operations

Every running instance, on one screen,
with the heartbeat and the queue depth

Instances is the running-process view of the agent fleet. Each agent type runs as one or more instances across regions. This page shows them all: CPU, memory, queue depth, last heartbeat, version pinning, owning region. Use it to spot a stuck instance, an under-provisioned region, or an instance running an outdated version.

Get Started Talk to Sales
app.novaaiops.com / instances
● LIVE
Per-region
visibility
< 10s
heartbeat freshness
Per-instance
KPIs
Restart
one-click
What's Tracked

Six metrics per instance

For each instance, the page reports: CPU usage, memory usage, queue depth (tasks waiting for this instance), p95 task latency, last heartbeat time, and the agent version pin. The data refreshes every 5 seconds. Color coding is consistent with the rest of the platform: green / yellow / red on threshold breach.

  • CPU + memory: per-process resource usage on the host
  • Queue depth + p95: how busy is this instance and how slow are its tasks getting
  • Heartbeat + version pin: is it alive and what version is it running
app.novaaiops.com / instances · per-instance
One-Click Restart

Stuck instance? Restart it

Stuck instances happen. The page has a one-click restart per instance with graceful semantics: the instance drains its queue (or hands work to a sibling), then exits, then is replaced. The whole loop usually takes under 30 seconds. Restart is logged in Agent Ledger and Audit Logs so the postmortem has the trail.

  • Graceful drain first: in-flight tasks finish or are handed off; no orphan work
  • Auto-replacement: a fresh instance starts before the old one fully exits; no capacity gap
  • Logged with operator id: restart events show up in the audit log with who pressed the button
app.novaaiops.com / instances · restart
Region Balance

Spot under-provisioned regions

Each region's instance count is shown alongside its load. A region with high queue depth and no spare instances is under-provisioned. The page surfaces this directly: "us-east is at 92% saturation across 4 instances, eu-west has 1 instance at 38%." Recommendations include a one-click "scale us-east +2" action.

  • Per-region rollup: instance count and average load per region; spot the saturated and the empty
  • Recommendation: when a region is consistently saturated, the page recommends the scale-up
  • One-click scale: apply the recommendation directly; passes through Approval Queue if the action class requires
app.novaaiops.com / instances · regions
Version Drift

Catch instances running stale code

Each instance carries a version pin. Mixed-version fleets cause subtle bugs. The page highlights version drift: when most instances are on v 12 but one is on v 11 (or vice versa), the outlier is flagged. Drift detection runs continuously; flagged instances surface in the daily report.

  • Version pin per instance: every instance reports its agent version; mismatches surface immediately
  • Outlier flagging: instances on a different version from the fleet majority are visibly flagged
  • Daily report: version drift events captured in the daily ops report so they do not linger
app.novaaiops.com / instances · drift
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

A process list for the AI fleet

Agents are processes. Processes get stuck. Instances is the page that shows you which one and lets you restart it without ssh.

Get Started Request a Demo