AI & ML Beginner By Samson Tanimawo, PhD Published Apr 17, 2026 9 min read

LLM Hallucinations: Why Models Make Things Up

A hallucination is the model emitting text that sounds confident and is wrong. It is not a bug. It is the predictable consequence of how next-token prediction works.

What a hallucination actually is

A hallucination is the LLM stating something untrue with the same confidence it states something true. Asked “who won the 1996 NBA Finals?”, it might answer correctly (Chicago Bulls) or it might answer the Houston Rockets, with identical conviction in both cases.

The word is misleading. The model isn’t hallucinating in any psychological sense. It’s producing the most-probable next tokens given its training distribution, regardless of whether those tokens correspond to reality.

Why next-token prediction hallucinates

The model was trained to predict the next likely token. That objective doesn’t distinguish between “this is what people typically wrote next when this question came up” and “this is true.” The two correlate strongly for well-known facts; they decouple sharply for niche ones.

If you ask about a famous person, billions of training tokens reinforce the same correct facts. If you ask about an obscure person, the model has seen scattered, possibly conflicting fragments. It still produces an answer, because predicting some next token is what the architecture does. The result is plausible-sounding fabrication.

Three architectural facts make this worse. The model has no separate “I don’t know” state. The temperature parameter randomises sampling, so even when the model leans toward correctness, it sometimes picks a less probable wrong token. And the model can’t check its own claims against an external source unless explicitly given one.

The three flavours of hallucination you’ll see

Factual hallucination: an asserted fact is wrong. The model invents a date, a name, a statistic, a paper title. Most common with long-tail knowledge.
Attribution hallucination: the model cites a source that doesn’t say what it claims, or a source that doesn’t exist. Lawyers have been sanctioned for filing briefs with hallucinated case law.
Reasoning hallucination: the model makes a logical or arithmetic mistake mid-explanation while sounding rigorous. Multi-step math is the canonical example.

The three families have different remedies, which is why diagnosing which kind you’re seeing matters.

How to detect them in your application

Three practical signals:

Verifiability checks: for any factual claim, can the model cite a passage from its retrieval context? If not, treat as suspect.
Cross-model agreement: ask the same question to two different models. Disagreement is a strong signal one of them hallucinated.
Self-consistency: ask the model the same question three times with temperature > 0. If the answers diverge, low confidence; if they agree, higher confidence (but still not proof).

Four mitigations that work in production

Retrieval-augmented generation (RAG). Don’t ask the model to recall facts from training. Retrieve them at runtime from a trusted source and inject them into the prompt. The model becomes a synthesiser, not an oracle.
Force citations. Require the model to quote the retrieved source for every factual claim. If it can’t produce a quote, it can’t make the claim.
Lower temperature. Temperature 0.2-0.5 produces more conservative sampling. The model picks higher-probability tokens, which correlates (imperfectly) with safer ones. Don’t go to 0; some randomness improves reasoning.
Constrain outputs. JSON schemas, regex validation, or finite enumerations limit what the model can say. A model forced to choose from {yes, no, unknown} can’t hallucinate a fourth option.

None of these eliminate hallucinations. Combined, they reduce them by an order of magnitude.

When hallucination is the feature

Creative writing, brainstorming, code generation under loose specs, hypothesis generation: in these settings the model’s ability to combine training patterns into novel outputs is exactly what you want. “Make up a story” isn’t hallucination; it’s the assignment.

The same mechanism that produces hallucinated case law also produces a plausible plot for a novel chapter. The mechanism is neutral. Whether it’s a bug or a feature depends entirely on whether your application demands ground truth.

Build with the assumption that the model will hallucinate. Architect verification into your system, and accept the residual risk for use cases where verification isn’t possible. That’s the working stance for production LLMs in 2025.