Agentic SRE Advanced By Samson Tanimawo, PhD Published Aug 4, 2026 5 min read

Building Your First SRE Agent: A 30-Minute Walkthrough

From empty repo to a working triage agent in half an hour. The minimum viable architecture, the three tools your first agent needs, and the trap most teams fall into on day two.

Pick a scope you can finish in 30 minutes

The first SRE agent that ships is always narrower than the one you imagined. Pick a scope where the inputs are bounded, the success criterion is unambiguous, and the failure mode is recoverable. A good first scope: triage a single alert type, return a structured hypothesis. A bad first scope: handle any incident the on-call gets.

If you cannot describe the success criterion in one sentence, the scope is wrong. Narrow until you can. "Given a Postgres p95-latency alert, return the three most likely causes ranked by probability" is a sentence. "Help on-call" is not.

The 30-minute target is a forcing function. It cuts decision branches. It rules out custom infra. It pushes you toward the smallest agent that earns its keep.

The minimum viable loop

Your first agent is a function: input goes in, hypothesis comes out. There is no need for memory, multi-step planning, or tool chaining. A single LLM call with a tightly scoped prompt and one read-only tool will solve more SRE problems than most teams admit.

The loop is: pull the alert payload, fetch the last 30 minutes of relevant metrics, hand both to the model with a structured-output schema, return the model's output. That is it. Anything else can wait for week two.

Resist the urge to add memory in version one. Memory is where bugs hide. A stateless agent that runs cleanly per invocation is dramatically easier to debug than a stateful one that drifts over time.

The three tools your first agent needs

Tool one: a metric-query tool. Read-only, with a tight allowlist of metrics, scoped to a service. The agent should be able to ask "what was p99 for service X over the last hour" and get a single number back.

Tool two: a recent-events tool. Reads the last N deploys, config changes, or feature-flag flips. Most incidents correlate with a recent change; surfacing that change is half the work of triage.

Tool three: a logs-search tool. Bounded to the affected service, with a token budget per query so a single search cannot blow up the run. This is the tool you will gate most carefully because it is the easiest to abuse.

The trap most teams fall into on day two

Day one ships a working agent. Day two everyone wants to expand its scope. Resist. The expansion that breaks most agents is letting the agent take actions instead of just reading. Read-only agents are the ones that earn trust.

The other day-two trap is adding more tools. Each tool widens the surface where the model can confuse itself. Add a new tool only after you have an eval suite that proves the existing tools are reliable.

When you do expand, expand by use case, not by capability. "Now triage this second alert type" is safer than "now also act on the first alert type."

What to ship at the end of 30 minutes

A python file or notebook with the agent loop. A single tool wired up. Three test cases the agent passes. A way to run it on demand. That is the deliverable; ship it, share the link, and let the team poke at it.

Add a logging line for every model call: prompt size, response, latency, cost. The first regression you catch will be a cost regression, and the log line is the only artefact that lets you diagnose it.

Plan version two before you celebrate. Version two adds the second tool, the eval harness, and a confidence score in the structured output. None of these are required for version one.

Common antipatterns

Too broad a scope. Trying to handle "any production incident" on day one. Pick one alert type, one service, one decision.

Tool overload. Wiring six tools because you might need them. The model picks the wrong one and the agent loses focus. Three tools, no exceptions.

Premature memory. Adding a vector store before you have a stateless baseline. You will spend more time debugging memory than improving the agent.

What to do this week

Pick the alert your team gets paged on most. Write the sentence that describes a successful triage of that alert. Build the agent that produces exactly that output, with three tools, in 30 minutes. Run it on the next page; compare its output to what the human on-call concluded.