Choosing Between One Big Agent and Five Specialists

One generalist with all tools is simpler. Specialists are more reliable. The decision rule, with cost numbers, that picks the right shape per use case.

The rule

The choice between one big agent and five specialists is not a matter of taste. It maps cleanly to the failure modes the agent will see in production and to whether each specialty has an owner who keeps it sharp.

Specialists win on diverse failures. When MTTR matters and the failure shapes are diverse, dedicated specialists beat a generalist on accuracy and latency.
Specialists need owners. Each specialty needs its own eval suite and prompt-engineering owner. Without dedicated owners, specialists rot inside a quarter.
Generalists win on narrow scope. Single-domain or low-volume agents do not pay back the orchestration tax of specialists.
Default to specialists. Most production SRE work is diverse failure modes. Resist the simplification of “one big agent” once you see real traffic.

Cost numbers

Cost rarely decides the architecture because the two shapes converge. The decision is reliability, not invoice.

Generalist run. One prompt, around 8k tokens, roughly $0.04 per run. Higher latency from the larger context window.
Specialist run. Five prompts at around 2k tokens each, roughly $0.05 per run total. Each specialist is faster individually; orchestration adds latency overhead.
Reliability gap. Tighter specialist prompts produce more reliable behaviour on their domain than a sprawling generalist prompt covering all five.
Eval cost. Specialist eval suites are smaller and run faster, which reduces the overall cost of every change to a prompt or tool.

When generalists win

The generalist shape is the right answer in three situations. Each is about scope being too narrow to amortise the specialist tax.

Single-domain agents. Only DNS, only Kubernetes, only databases. Within one domain a generalist is fine; specialisation does not help.
Low-volume agents. Under a handful of runs per day; orchestration overhead is not amortised, so the simpler shape wins.
Early-stage agents. Ship a generalist first, observe the failure modes for a quarter, and split into specialists when the data tells you where the seams are.
Internal demos. Tools meant to inspire, not to operate. The generalist is easier to demo and easier to throw away.

How to split into specialists

Splitting by tool rather than by failure mode is the most common mistake. The split below scales, the alternative does not.

By failure mode. “Database specialist” handles all database failures, not just “queries” or “connections.” The boundary follows what fails together.
Independent products. Each specialist gets its own prompt, its own tools, its own eval suite. Treat them as independent products with their own roadmaps.
Deterministic orchestration. Orchestration is plain code, not a meta-agent. The orchestrator dispatches on signal type; the dispatch is reproducible without an LLM call.
Shared scratchpad. Specialists communicate through a typed scratchpad, not free-text. The scratchpad is what makes hand-offs reviewable.

When to rejoin specialists into a generalist

Rejoin is rare. Most evolution moves toward more specialisation, not less, but two signals point the other direction.

Convergent prompts. If two specialists keep converging on similar prompts and similar failures, consider rejoining. Maintaining duplicate effort is waste.
Low usage. A specialist with fewer than five runs per week is not paying its keep. Rejoin into a sibling and let the prompt grow slightly.
Domain collapse. When two domains genuinely merge in your infrastructure (a managed service replaces three internal ones), the specialists merge with the domains.
Default to specialise. Most evolution is the other direction; specialists deepen as failure modes proliferate, not the reverse.