Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing

arXiv:2602.0090645.21 citationsh-index: 2

AI Analysis

This provides a theoretical explanation for hallucination in AI models, addressing a critical issue for users relying on accurate information, though it is incremental as it builds on existing rate-distortion and information theory frameworks.

The paper tackles the problem of hallucination in large language models by formalizing it as a membership testing problem, showing that even with optimal training and perfect data, the information-theoretically optimal strategy under limited capacity leads to assigning high confidence to non-facts, resulting in hallucination, with validation on synthetic data.

Large language models often hallucinate with high confidence on "random facts" that lack inferable patterns. We formalize the memorization of such facts as a membership testing problem, unifying the discrete error metrics of Bloom filters with the continuous log-loss of LLMs. By analyzing this problem in the regime where facts are sparse in the universe of plausible claims, we establish a rate-distortion theorem: the optimal memory efficiency is characterized by the minimum KL divergence between score distributions on facts and non-facts. This theoretical framework provides a distinctive explanation for hallucination: even with optimal training, perfect data, and a simplified "closed world" setting, the information-theoretically optimal strategy under limited capacity is not to abstain or forget, but to assign high confidence to some non-facts, resulting in hallucination. We validate this theory empirically on synthetic data, showing that hallucinations persist as a natural consequence of lossy compression.

View on arXiv PDF

Similar