AI & ML Advanced By Samson Tanimawo, PhD Published Jan 13, 2026 7 min read

Agentic Reasoning: Tree of Thoughts, ReAct, and Reflection

Three patterns for getting models to reason better at inference time. Each adds compute but unlocks problems the model couldn’t solve in a single pass.

Chain-of-thought, briefly

Chain-of-thought prompts the model to reason step-by-step before answering. Helps on math and logic. Linear: one chain, one answer.

Tree of Thoughts

Generate multiple branches at each reasoning step. Explore each. Score them. Keep the best.

Concretely: at step N, produce K candidate next-steps. Evaluate each. Discard losers. Recurse. The model effectively does breadth-first search in reasoning space.

Cost is K× chain-of-thought. The win on hard problems can be 20-50 percentage points; on easy ones, ToT is wasted compute.

ReAct

ReAct alternates reasoning steps with action steps. Reason about what to do; do it (call a tool); observe the result; reason about the next step.

This is the canonical agent pattern. Pure-reasoning models can’t check facts or run code; pure-acting models can’t plan. ReAct ties them together.

Reflexion

The model does its work, then critiques it. The critique becomes part of the prompt for the next attempt. The model effectively self-grades.

Effective when the task has verifiable outcomes (tests pass / fail, math has right answer). Less effective on subjective tasks where the model can’t reliably judge its own output.

Combining them

Modern reasoning agents use all three: ReAct for the outer loop (think, act, observe), ToT for hard intermediate decisions, Reflexion for verification.

The combined patterns are what powers the “reasoning models” (o1, Claude reasoning, DeepSeek-R1). The model doesn’t just generate; it explores, acts, evaluates, and revises, all within a single “turn” from the user’s perspective.