AI Safety & Governance

Two agents fighting in a loop is not progress,
the detector catches it in seconds, not days

Livelock Detector watches for oscillation patterns: scale up, scale down, scale up, scale down. Open a feature flag, close it, open it again. Agents fighting agents. The detector recognizes the pattern, halts both sides, pages an operator, and writes a runbook with the loop reproduction. No more "we paid for autoscaling that flapped 800 times overnight."

Get Started Talk to Sales
app.novaaiops.com / livelock
● LIVE
< 60s
Detect-and-halt latency
3
Cycles to trigger
Both
sides paused
Auto
Runbook generated
How It Detects

Three cycles, then halt

The detector keeps a rolling window per resource (per service, per flag, per IAM role) of the last few state-change actions. When the window shows three identical reversals (state A → B → A → B → A → B), it declares a livelock. Three cycles is enough to be sure, few enough to halt before damage. Both sides are paused, and the loop reproduction is written to a runbook.

  • Rolling window per resource: state changes are scoped to the resource, not global, so unrelated agents do not falsely trigger
  • Three reversals = halt: two could be a normal correction; three is a pattern
  • Both sides paused: both contributing agents transition to read-only until an operator releases them
app.novaaiops.com / livelock · detection
Halt Behavior

Both agents go read-only on the resource

When the detector triggers, both agents are paused on the contested resource only. They keep working on other resources. The contested resource is locked from automated change until an operator reviews. This minimizes the disruption while preventing the loop from continuing.

  • Resource-scoped pause: agents are not killed; they cannot touch the contested resource
  • Other work continues: the same agents keep handling unrelated incidents on other services
  • Lock cleared by operator: release requires a human acknowledgment that the conflict is resolved
app.novaaiops.com / livelock · halt
Auto-Generated Runbook

A reproduction the operator can review

The detector writes a runbook capturing the loop's state machine: which agent reverses what, at which threshold, citing what evidence. The runbook is the artifact you read to understand the conflict. Most loops resolve via either tightening one agent's trigger threshold or adding a hysteresis band so they do not contradict each other.

  • State machine captured: the runbook shows the trigger conditions and reversals as a small diagram
  • Suggested resolutions: common fixes: hysteresis bands, ownership boundaries, escalation thresholds
  • One-click apply: recommended config changes can be applied from the runbook with operator approval
app.novaaiops.com / livelock · runbook
Reporting

Loops are tuning signal, not noise

A loop a week is normal in a busy fleet. A loop a day is a tuning problem. The weekly report tracks loop count, top contested resources, top contributing agents, and the resolution rate. Use it to spot configuration mistakes before they cause customer-visible incidents.

  • Weekly loop report: count, top resources, top agents, resolution rate, emailed Monday
  • Contested-resource ranking: helps you see which areas of the platform have ambiguous ownership between agents
  • Per-agent contribution: agents that show up in many loops may need a tighter scope or a longer trigger window
app.novaaiops.com / livelock · report
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

A loop is not effort, it is waste

Catching oscillation early is the difference between a learning system and a system that burns cloud credits all night.

Get Started Request a Demo