AI Agent Operations

Strengths and weaknesses, agent by agent,
so you tune the right thing

Agent Audit is the per-agent diagnostic. For each agent, the page reports its top success classes, top failure classes, top tools used, and the failure modes that recur. Use it to decide which agents to promote, which to demote, and which need their prompt or schema tuned.

Get Started Talk to Sales
app.novaaiops.com / agent-audit
● LIVE
4
Audit dimensions
Per-class
success and failure rates
Tuning
recommendations
Weekly
audit refresh
Class-Level Breakdown

Not just "this agent is 92% successful"

Aggregate trust scores hide which classes the agent is good at and which it is bad at. The audit page breaks down per class: a postgres-doctor that is 98% on deadlocks, 94% on slow queries, and 62% on vacuum-on-partitioned shows up as a 92% aggregate but a clear weakness on one class. The class breakdown is what makes the data actionable.

  • Per-class success rate: every action class the agent has tried, with verified success rate and sample size
  • Sample size shown: a 50% rate on 2 attempts is noise; a 50% rate on 200 attempts is signal, the page tells you which
  • Trend per class: each class has its own trend so you can spot newly-regressed classes early
app.novaaiops.com / agent-audit · classes
Failure-Mode Patterns

Recurring failures, named

Failures are clustered into named patterns. "Schema confusion on partitioned tables." "Cache miss assumption when warm cache." "Wrong service in service-graph traversal." Each pattern is a clue the agent's prompt or context is missing something. The fix is usually one prompt edit, not a full re-train.

  • Named patterns: failures are clustered and named, not left as opaque error counts
  • Linked to bundles: every pattern lists the contributing decision bundles so you can read the prompts that failed
  • Suggested fix: common patterns ship with a recommended prompt or schema fix
app.novaaiops.com / agent-audit · patterns
Tool Usage

Which tools, how often, with what success

For each agent, see which tools it uses, how often, and what its tool-call success rate is. A tool with a low success rate (rejected by Action Schema Validator, or always returning errors) is either a brittle tool or an agent prompt that calls the tool wrong. Both are findable from the table.

  • Tool usage table: every tool the agent has called, with count, success rate, and median duration
  • Reject reason breakdown: when validator rejects fire, the table shows which validation kind (type, range, required)
  • Cross-reference: click any tool row to filter the audit to that tool only across all agents
app.novaaiops.com / agent-audit · tools
Tuning Recommendations

Concrete next actions per agent

The audit page closes with concrete recommendations: edit the system prompt to add X, demote the agent on class Y, narrow the agent's scope to exclude Z. Each recommendation is one PR or one config change away from being a real improvement. The list is always shorter than 5, the page picks the top wins, not every possible nit.

  • Top 5 max: no overwhelming list; only the highest-impact fixes shown per audit
  • Concrete: "add partition awareness to line 14 of the system prompt", not "improve the prompt"
  • Estimated impact: each recommendation includes an estimated % impact on the agent's overall success rate
app.novaaiops.com / agent-audit · recommend
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Specialists improve when you can see what they are bad at

Audit turns "this agent is unreliable" into "this agent fails on partitioned tables, here is the prompt fix."

Get Started Request a Demo