Agentic SRE Advanced By Samson Tanimawo, PhD Published Jul 28, 2026 5 min read

Single-Shot vs Iterative Agents for Incident Response

Some incidents need one model call with the right context. Some need iterative reasoning over many turns. The cost and latency math that picks the right shape per incident type.

When single-shot wins

A single-shot agent makes one model call, returns the answer, exits. It is the right shape when the input is small enough to fit in one prompt and the answer is reachable in one reasoning step.

Most triage tasks are single-shot. The alert payload, the metrics, the recent changes; all of it fits in 4k tokens. The model reasons; it returns a hypothesis. Done in 2 seconds.

Single-shot is the cheapest option by an order of magnitude. Latency is one model call. Cost is one model call. Failure modes are one model call. If you can shape the task as single-shot, do.

When iterative wins

Iterative agents make multiple model calls, with tool calls in between. They are the right shape when the answer requires evidence the prompt does not have, and the evidence has to be fetched dynamically based on intermediate reasoning.

Investigation tasks are iterative by nature. "What caused this regression" usually requires looking at one signal, forming a hypothesis, looking at another signal to confirm. You cannot prefetch all signals; you do not know which to prefetch until you have started reasoning.

Iterative agents are 5-10x the cost of single-shot. Budget accordingly. The cost is justified for tasks the single-shot cannot handle.

The hybrid pattern in production

Most production SRE work fits a hybrid. Start single-shot with prefetched context. If the model says "I need more data," upgrade to iterative for that run. If it returns a confident hypothesis, exit.

This pattern captures most of the latency and cost benefit of single-shot for the easy cases, while preserving the depth of iterative for the hard ones. The implementation is a single "need more data" branch in the loop.

Track the rate at which runs go iterative. If it is below 20%, the prefetched context is doing its job. If it is above 60%, you should pre-fetch more. The right number depends on your traffic shape.

Latency math you should know

Single-shot p50 is one model call: 1-2 seconds for a small prompt, 4-8 seconds for a large one. Iterative p50 is 3-7 model calls plus tool calls: 15-45 seconds end to end.

For triage, where on-call is waiting, single-shot is the obvious choice. For postmortem drafting, where nobody is waiting, iterative is fine. Pick by who is waiting and how long they will tolerate.

p99 is where the choice gets brutal. Single-shot p99 might be 8 seconds; iterative p99 can be several minutes. Set timeouts and bounds accordingly. Never let an iterative run exceed a budget you would not pay.

Cost math you should know

Single-shot at 4k tokens with a frontier model: ~$0.02 per run. Iterative with 5 turns: ~$0.10-0.30 per run. The 10x difference compounds; at high volume, it is the difference between a $5k/month bill and a $50k/month bill.

Use the cheaper model for single-shot if quality holds. The cost ratio between Haiku and Opus, or between GPT-4o-mini and GPT-4o, is 5-30x. The quality ratio is much closer to 1x for triage. Run the eval; do not assume.

For iterative, use a frontier model. The cost-saving from a smaller model is overwhelmed by the lost quality across multiple turns. The math swings the other way.

What to do this week

Pick an iterative agent that you suspect is doing more work than needed. Add a single-shot path: pre-fetch the most common context, attempt single-shot first, fall back to iterative if confidence is low. Measure: cost should drop materially; latency p50 should drop; quality should hold. Most iterative agents have a single-shot escape route hiding in them.