Agentic SRE Advanced By Samson Tanimawo, PhD Published Apr 20, 2026 5 min read

Hand-off Patterns Between Triage and Remediation Agents

Triage produces a hypothesis. Remediation acts on it. The handoff schema, the validation step in between, and the case where remediation should refuse the handoff.

The handoff schema

Triage output: hypothesis (with confidence), evidence (the data that supports the hypothesis), recommended action (the next step).

Each field is required. Missing fields fail the handoff; the remediation agent does not act on partial input.

Schema is versioned. Changes to the schema are coordinated; both agents see the change at the same time.

Validate before handing off

Confidence threshold: if triage confidence < 0.7, the handoff stops; the case escalates to a human instead.

Action allowlist: triage's recommended action must be in the remediation agent's tool allowlist. If not, the handoff fails.

Evidence freshness: the evidence must be < 5 minutes old. Older evidence may be stale; refresh before acting.

When remediation should refuse

Hypothesis is implausible given the evidence. The remediation agent does its own check; if the hypothesis does not match the evidence, refuse.

Action is high-risk and confidence is borderline. The remediation agent escalates rather than acting.

Required pre-conditions are not met. Pre-flight checks fail; the action does not fire.

Auditing the handoff chain

Each handoff is logged: triage agent, hypothesis, confidence, recommended action, remediation agent, action taken (or refused).

The chain is reconstructable for any past run. "Why did the remediation agent restart this pod?" answers from the chain.

When things go wrong, the handoff log is the first thing to read. Was the bug in triage, in remediation, or in the handoff itself?

Eval cases for the chain

Successful handoff: triage produces correct hypothesis; remediation acts; outcome matches expected.

Refusal handoff: triage produces correct hypothesis with low confidence; remediation correctly refuses.

Stale handoff: evidence is old; remediation correctly refuses or refreshes.

Wrong-action handoff: triage suggests an action outside remediation's scope; the handoff correctly fails.