Multi-Agent Systems: Orchestrating Specialists
One generalist agent does everything badly. Five specialist agents, coordinated, do everything well. Multi-agent systems are how production AI moves past chatbot-with-tools.
One generalist agent vs many specialists
A single agent doing many roles fails the same way one over-loaded engineer does: it switches contexts poorly, drops details, and runs out of working memory.
Multi-agent splits the work. One agent classifies an incoming alert. Another retrieves relevant context. A third proposes a remediation. A fourth validates against policy. Each is small, focused, debuggable on its own.
The wins compound. Specialist prompts are simpler. Specialist tool sets are smaller. Specialist evals are tractable. Failures are localised.
Three orchestration patterns
Manager + workers. A manager agent decomposes the task and dispatches subtasks to specialist workers. Workers report back; the manager assembles the answer. Most common pattern. Easy to reason about. The manager is the failure mode (if it plans badly, everything downstream is wasted).
Peer-to-peer. Agents communicate directly with each other, no central manager. Useful when the task is genuinely collaborative (debate, consensus). Harder to debug; convergence isn’t guaranteed.
Hierarchical / layered. Multiple levels of managers and workers. Used at scale where one manager can’t see everything. Common in large agent systems but adds significant complexity.
For first multi-agent projects, manager + workers is the pragmatic default. The manager is a router with a prompt. Workers are independent agents. Connect with structured messages.
Shared memory and state
Independent agents need shared state. Three options:
- Workspace: a shared structured object (typically JSON) that all agents read from and write to. Simple, transparent, easy to debug.
- Message bus: agents publish events; subscribed agents react. Decouples agents from each other. Useful for asynchronous workflows.
- Vector memory: shared semantic memory of past actions and observations. Agents retrieve relevant memories on each turn. Closer to the "long-term memory" of agent systems but harder to reason about.
Most production systems use a workspace. It’s legible, durable, and easy to snapshot for replay/debugging.
The cost of coordination
Multi-agent isn’t free. Three real costs:
- Latency: sequential agents stack up. A 5-step manager+worker plan with 1s per call is 5s end-to-end. Parallelise where possible.
- Tokens: each agent re-receives some context. Total tokens can be 3-5x a single-agent solution.
- Coordination failures: the manager misroutes; a worker hallucinates and corrupts the workspace; the answer is wrong despite each agent looking fine in isolation.
Multi-agent makes sense when the task complexity exceeds what one agent can hold. Below that threshold, the coordination tax dominates the win.
Where multi-agent fails today
Three failure modes worth planning for:
- Agents lying to each other. A worker reports success; the manager believes it; downstream actions assume success. Build verification: results must be checkable, not just reported.
- Cascading hallucinations. Worker A invents a fact; worker B uses it as input; the answer is confidently wrong. Mitigate with grounded retrieval at every step that needs facts.
- Infinite manager loops. The manager keeps re-decomposing without producing output. Hard limits on iteration count. Force termination at N steps.
Where to start
Don’t build multi-agent on day one. Build a single agent that does the whole task badly. Identify which subtasks it’s worst at. Pull those out as workers. Iterate.
Concretely: ship version 1 as a single agent with a long prompt and 6 tools. When that has been in production for a month and you have eval data showing where it fails, refactor into 2-3 agents. Don’t architect a 7-agent hierarchy on day one based on a hunch. The actual decomposition rarely matches the planned one.