Agent Audit is the per-agent diagnostic. For each agent, the page reports its top success classes, top failure classes, top tools used, and the failure modes that recur. Use it to decide which agents to promote, which to demote, and which need their prompt or schema tuned.
Aggregate trust scores hide which classes the agent is good at and which it is bad at. The audit page breaks down per class: a postgres-doctor that is 98% on deadlocks, 94% on slow queries, and 62% on vacuum-on-partitioned shows up as a 92% aggregate but a clear weakness on one class. The class breakdown is what makes the data actionable.
Failures are clustered into named patterns. "Schema confusion on partitioned tables." "Cache miss assumption when warm cache." "Wrong service in service-graph traversal." Each pattern is a clue the agent's prompt or context is missing something. The fix is usually one prompt edit, not a full re-train.
For each agent, see which tools it uses, how often, and what its tool-call success rate is. A tool with a low success rate (rejected by Action Schema Validator, or always returning errors) is either a brittle tool or an agent prompt that calls the tool wrong. Both are findable from the table.
The audit page closes with concrete recommendations: edit the system prompt to add X, demote the agent on class Y, narrow the agent's scope to exclude Z. Each recommendation is one PR or one config change away from being a real improvement. The list is always shorter than 5, the page picks the top wins, not every possible nit.
Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.
Audit turns "this agent is unreliable" into "this agent fails on partitioned tables, here is the prompt fix."