When SRE Agents Hallucinate Tool Output (and How to Detect It)
Agents sometimes invent tool results that the tool never returned. The detection harness, the most common provocations, and the prompt-level fixes that work.
How it happens
The model is trained to produce confident output. When a tool returns nothing, the model often fills the gap rather than reporting the empty result.
Vague tool descriptions invite hallucination. "Returns metric data" is vague; "returns a JSON object with fields x, y, z, or null on no data" is specific.
Long contexts make hallucination worse. The model loses track of what was actually fetched and what it inferred.
Detection harness
Wrap every tool call. Log: tool name, args, return value, return time. Pair the log with the model's subsequent output.
Run a daily job: for each agent run, find statements in the model output that reference tool data. Cross-check against the actual tool returns. Mismatches are flagged.
Mismatches are reviewed by a human at first. Once the harness is calibrated (low false-positive rate), it can auto-flag without review.
Common provocations
Asking the model to summarise tool output that was empty. Fix: in the prompt, tell the model what to do when output is empty ("say so explicitly").
Asking the model to combine output from multiple tools. The model may attribute facts to the wrong source. Fix: structured output that names the source.
Asking the model to extrapolate. "Based on the trend, what is next week's value" lets the model fabricate a trend if the data did not show one.
Prompt-level fixes
Add an explicit clause: "Only use information that came from a tool call. If a tool returned no data, say so."
Show the model the raw tool output, not a summary. Summarisation is where hallucinations creep in.
Use structured output with a "sources" field. The model has to attribute each fact to a source; missing source raises a flag.
The cost of detection
Detection is mostly compute, not engineering. The harness adds 10-20% to per-run cost.
The cost is recovered in caught regressions. Each hallucination caught early prevents a downstream incident.
Disable detection in production after the harness has been calibrated and the agent is stable. Re-enable on every prompt change. Treat it like a regression-detection layer that is on for risky changes.