Agentic SRE Advanced By Samson Tanimawo, PhD Published Aug 2, 2026 5 min read

Agent Prompts vs Agent Code: When Each Wins

Some agent logic belongs in the prompt. Some belongs in deterministic code. The decision rule that keeps your agent reliable, debuggable, and cheap.

The rule that holds in production

Anything deterministic belongs in code. Anything that requires judgement belongs in the prompt. The exception is when the deterministic answer is too tedious to code; then it sometimes still belongs in the prompt, but you accept the cost.

"Did latency exceed the threshold?" is deterministic. Code. "Is this latency pattern characteristic of GC pressure or a slow query?" is judgement. Prompt. The boundary between the two is where most agent bugs live.

When in doubt, lean toward code. Code is debuggable, testable, and free at runtime. Prompts are stochastic, expensive, and prone to silent regressions every time the model updates.

Concrete examples that get this wrong

Mistake one: routing a request based on a regex match using the LLM. The model has to read the input, decide which service is affected, and dispatch. A regex would have been cheaper, faster, and 100% accurate.

Mistake two: aggregating numeric data inside the prompt. "Here are 50 metric points; what is the mean?" The model will hallucinate an answer. Compute the mean in code; pass the result to the model for interpretation.

Mistake three: encoding business rules as natural language in the prompt. "Severity is critical if revenue impact is over $10k." Now the threshold lives in two places (the prompt and the alerting rule). Keep it in code so it has one home.

The hybrid pattern that works

Most production agents are 80% code and 20% prompt. Code does the orchestration, the parsing, the validation, the deterministic routing. The prompt does the reasoning step that needs judgement.

Wrap each LLM call in a function with a clear contract: typed inputs, typed outputs, validation on both sides. The prompt is an implementation detail of that function. Treat the prompt the way you treat any internal helper: replaceable, scoped, tested.

When the model gets better, you swap the prompt without touching the surrounding code. When the orchestration changes, you swap the code without touching the prompt. The separation pays compounding dividends.

Why this matters when something breaks

When code breaks, you read a stack trace. When a prompt breaks, you read 50 transcripts and squint. The cost of debugging stochastic logic is several times the cost of debugging deterministic logic. Keeping prompt scope small keeps the debugging cost small.

If you can write a unit test for the behaviour, write it in code. If the behaviour requires an eval (sample of N runs, scored), it goes in the prompt. The line between unit-testable and eval-required is the line between code and prompt.

Prompt regressions are silent. A prompt that was 95% accurate yesterday might be 88% accurate today after a model update. Code regressions are loud. Push the bar of what a prompt has to do.

Review the boundary every quarter

Once a quarter, look at the prompts in your agent and ask: which of these have stable behaviour I could now encode as code? Often the answer is several. The prompts have served as a fuzz test that revealed the deterministic rule.

Promote the rule out of the prompt and into code. The prompt gets shorter, cheaper, and more focused. The code gets a unit test. The agent gets faster and more reliable. This is one of the few refactors that strictly dominates.

Track the size of the prompt over time. If it grows monotonically, you are not refactoring enough. The healthy pattern is grow-then-prune, with quarterly pruning.

Common antipatterns

Routing in the prompt. Asking the model to decide which service is affected when a regex would do. Cheaper and more reliable in code.

Numeric reasoning in the prompt. Asking the model to compute means, percentiles, or differences. Compute in code; let the model interpret.

Business rules in the prompt. Hard-coding thresholds in natural language. Keep them in one place — code — so updates are atomic.

What to do this week

Pull up your largest agent prompt. Highlight the lines that encode deterministic behaviour. Promote at least one of those lines out of the prompt and into a Python helper. Re-run the eval suite; you should see latency drop, cost drop, and accuracy hold or improve.