CLMay 21

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

arXiv:2605.2200743.0
AI Analysis

For researchers and developers of large language models, this work identifies a mechanism where instruction tuning's sharpening of answer commitment with scale leads to both helpfulness and confident hallucination.

The paper challenges the assumption that hallucination in LLMs is solely due to missing knowledge, showing that 16-47% of hallucinations occur when the correct answer concept is already available in the model's probability distribution, with this rate increasing with model scale. The key distinction is that correct generations concentrate probability on a single surface form, while hallucinations disperse it across alternatives.

Hallucination is often viewed as a direct consequence of missing knowledge: a model answers incorrectly when the correct answer is absent from its generation-time distribution, and correctly when it is present. We test this assumption by introducing a semantic notion of answer availability that aggregates token-level variants expressing the same answer concept, and asks whether the correct concept is already available at the moment the model commits to an answer. Across Qwen and Llama models from 0.8B to 72B in both Instruct and Base variants, 16-47% of Instruct hallucinations occur with substantial probability mass already on the correct concept, and the rate rises monotonically with scale. Comparing such failures against correct generations with matched semantic support, the distinguishing factor is not whether the correct concept is represented, but how its probability is distributed: correct generations concentrate mass on a single surface form, hallucinations disperse it across alternatives. The same sharpening asymmetry extends across multi-token generation and is detectable in pre-generation hidden states. Together, these results identify a single mechanism: instruction tuning sharpens answer commitment with scale, making helpfulness and confident hallucination two consequences of the same underlying disposition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes