Emergent Capabilities: Real or Mirage?
Some abilities seem to appear suddenly as models scale up. Are they really emergent or is the metric fooling us? The honest 2026 answer.
The original claim
Wei et al. (2022) reported that certain abilities are absent from small models, then appear suddenly at some scale threshold. Multi-step arithmetic, in-context learning of new tasks, instruction following, all looked discontinuous as a function of compute.
The measurement-artefact rebuttal
Schaeffer et al. (2023) pushed back: many emergence curves are an artefact of the metric. Exact-match accuracy on multi-digit math is 0% until the model is good enough to get every digit right; then it jumps to high accuracy. Switch to per-digit accuracy and the curve is smooth.
Their thesis: under continuous metrics (likelihood, partial credit), emergence often vanishes. The model was improving smoothly all along; the threshold-based metric hid it.
Capabilities that look genuinely emergent
The rebuttal doesn’t cover everything. Several capabilities resist smooth explanation:
- Tool use: small models can’t reliably emit JSON tool calls; large ones can.
- Self-correction in chain-of-thought: smaller models double down on errors; larger ones revise.
- Theory of mind in social reasoning: appears around frontier scale, hard to grade with continuous metrics.
For these, the 2025-2026 consensus is roughly: there’s real discontinuity in some cognitive-style abilities, even after metric correction. The claim is weaker than “everything is emergent” but stronger than “emergence is illusion.”
Why this matters for forecasting
If everything emerges smoothly, you can extrapolate from current models to future ones with confidence. If important abilities emerge in jumps, the next model could surprise you in either direction.
The practical stance: don’t over-bet on continuity. The capability you couldn’t buy at any price last year may be cheap on the next API release. Conversely, the “just over the horizon” ability may stay there longer than scaling laws predict.