Agentic SRE Advanced By Samson Tanimawo, PhD Published Jun 10, 2026 5 min read

Debugging an Agent That Made the Wrong Call

The five-question debug rubric. Was the tool result wrong? The prompt missing context? The model confused? The plan flawed? The output mis-parsed? Asked in this order, the bug is usually in the first answer.

The five-question rubric

1. Was the tool result correct? Compare what the tool returned with reality.

2. Did the prompt have the right context? The model can only reason on what it sees; missing context produces wrong calls.

3. Was the model confused? Sometimes the model just gets it wrong; this is the residual after the others are ruled out.

4. Was the plan flawed? The model might have correctly executed a wrong plan.

5. Was the output mis-parsed? The model might have produced the right answer that was then misinterpreted.

Why this order

Most agent bugs are in steps 1 and 2: bad data or missing context. Solving these accounts for ~70% of cases.

Step 3 (model confusion) is the residual. Fixing it requires prompt or model changes.

Steps 4 and 5 are infrastructure bugs. Less common but harder to diagnose without the rubric to direct attention.

Artefacts you need to debug

The full prompt sent to the model. The full response. The tool calls and responses. The decision the agent made.

Logs from the surrounding infrastructure. Sometimes the bug is in the tool, not the agent; the only way to know is to compare the tool's logs with the agent's view of them.

The eval case (or production case) reproducer. Without a reproducer, debugging is guesswork.

Once you know which step is wrong

Step 1: fix the tool, fix the wrapper, or add validation.

Step 2: add the missing context to the prompt or the agent's input.

Step 3: rephrase the prompt or switch models.

Step 4: refactor the planning logic.

Step 5: tighten the parser or use structured output.

Add an eval case after the fix

Whatever caused the wrong call now has a reproducer. Convert the reproducer into an eval case.

Run the case before fixing (proves the case captures the bug). Run after (proves the fix). Commit the case.

Future regressions on this case will be loud. The eval suite has gained one more useful case.