Agentic SRE Advanced By Samson Tanimawo, PhD Published Jun 29, 2026 5 min read

Stopping Criteria for Iterative SRE Agents

When does the agent stop? Goal achieved. Goal unreachable. Budget exhausted. Confidence threshold hit. The four-criterion stop policy and the order they fire in.

The four criteria

Goal achieved: the agent self-reports it has met the success criterion. Stop and return.

Goal unreachable: the agent self-reports it cannot make further progress. Stop and escalate.

Budget exhausted: tokens, time, or actions have hit their cap. Stop and escalate with partial state.

Confidence threshold: the agent's confidence in its current hypothesis has crossed a threshold (high or low). Stop and act, or stop and escalate.

Order they fire

Budget first: a budget-exceeded run cannot be saved by goal-achieved. The cap is hard.

Confidence next: high confidence ends the run early; low confidence escalates early. Both save iteration cost.

Goal achieved/unreachable last: these are the agent's self-report. They fire only if no earlier criterion has fired.

How agents self-report goal status

Structured output that includes a status field: in_progress, done, blocked. The loop reads the field and dispatches.

The model is forced to evaluate progress every step. This is a feature: forces the model to slow down and think about whether it is done.

Self-reports are sometimes wrong. The eval suite tests goal-status accuracy directly; over time the prompt is tuned to be accurate about its own progress.

Confidence threshold trade

High threshold (>0.9): agent stops only when very sure. Reduces false positives but increases run length.

Low threshold (>0.6): agent stops earlier. Reduces run length but allows more uncertain conclusions.

Pick by use case. Triage tolerates lower confidence (will be human-reviewed). Auto-remediation requires high confidence (acting on uncertainty is dangerous).

Evaluating the stop policy

Cases where the agent should stop after 3 steps. Pass if it does, fail if it runs longer.

Cases where the agent should escalate at step 5. Pass if it does, fail if it continues.

Cases where the agent should hit the budget. Pass if budget enforcement works, fail if the agent silently truncates.