AI & ML Advanced By Samson Tanimawo, PhD Published Jan 20, 2026 7 min read

Multi-Step Tool Use: The Planning Problem

A single tool call is easy. Five tool calls in sequence, where each depends on the last, is the hardest open problem in agent design.

Where it breaks

Single tool calls work because the prompt fully describes the situation. Five calls in sequence don’t, because each call’s output adds information that the model has to integrate into its plan.

The classic failure: the agent gets results from step 2 that should change its plan, but it sticks to the original plan from step 1 because that plan is the loudest thing in context.

Plan-then-execute

One approach: the model writes the full plan up front, then executes each step. Predictable; easy to audit. Doesn’t handle surprises.

Best for tasks with knowable structure: data extraction, deterministic workflows, scheduled jobs.

Reactive ReAct

The opposite: no upfront plan. Each step looks at current state and decides the next action. Handles surprises gracefully. Tends to wander on long tasks.

Best for exploratory tasks where the steps aren’t known: debugging, research, customer support.

Hybrid

The pattern that’s emerged in production: write a high-level plan, execute reactively within each step, replan if results contradict assumptions.

Frameworks like AutoGPT, BabyAGI, and the modern crop of agent SDKs implement variants. None is perfect; all work better than pure-plan or pure-react in their target domains.

The 2026 ceiling

Production agents top out at roughly 10-20 sequential tool calls before reliability collapses. Beyond that, the cumulative probability of a wrong call dominates.

Mitigations: smaller, focused agents that hand off to others (multi-agent); intermediate verification steps; hard limits on iteration count.

The next breakthrough, if and when it comes, will likely be on this axis. Models that can plan, monitor, and replan reliably for hundreds of steps are the unlock for many real-world AI applications.