CLFeb 22, 2025

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

arXiv:2502.16143v137 citationsh-index: 24ACL
Originality Highly original
AI Analysis

This addresses the persistent issue of factual inaccuracies in LLMs for users relying on reliable text generation, offering both theoretical insights and practical mitigation, though it is incremental in building on existing hallucination research.

The paper tackles the problem of hallucination in large language models by proposing the concept of knowledge overshadowing, where dominant knowledge obscures less prominent facts during generation, and introduces a log-linear law to predict hallucination rates based on knowledge popularity, length, and model size, with a new decoding strategy CoDa improving factuality by up to 27.9% on benchmarks.

Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes