World Models and Planning
A world model predicts what happens next given an action. Combine that with a planner and you get an agent that can reason about consequences before acting.
The idea
A world model maps (state, action) to predicted next state. The agent uses it for planning: imagine taking action A, see what state results, evaluate, repeat. Mental simulation, externalised.
Research lines
- Dreamer: world models for RL agents. Strong sample efficiency.
- JEPA: predict in latent space, not pixel space. More efficient, more robust.
- Genie: video-frame world models that learn from passive observation.
Why it matters
Today’s LLM agents reason in language. They can’t simulate physical or temporal consequences well. World models give agents a way to think about non-linguistic dynamics: physics, timing, cause-and-effect chains. This is likely the next major capability unlock for embodied AI.