CLMay 20, 2025

Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis

Haoming Huang, Yibo Yan, Jiahao Huo, Xin Zou, Xinfeng Li, Kun Wang, Xuming Hu

arXiv:2505.14406v48.33 citationsh-index: 17EMNLP

Originality Incremental advance

AI Analysis

This addresses a specific hallucination issue in LLMs, offering a new method for understanding and potentially mitigating it, but it is incremental as it builds on existing observations without broad SOTA impact.

The paper tackles the problem of knowledge overshadowing in Large Language Models, where one piece of knowledge masks another, causing hallucinations, and introduces PhantomCircuit, a framework that effectively analyzes and detects this phenomenon through knowledge circuit analysis.

Large Language Models (LLMs), despite their remarkable capabilities, are hampered by hallucinations. A particularly challenging variant, knowledge overshadowing, occurs when one piece of activated knowledge inadvertently masks another relevant piece, leading to erroneous outputs even with high-quality training data. Current understanding of overshadowing is largely confined to inference-time observations, lacking deep insights into its origins and internal mechanisms during model training. Therefore, we introduce PhantomCircuit, a novel framework designed to comprehensively analyze and detect knowledge overshadowing. By innovatively employing knowledge circuit analysis, PhantomCircuit dissects the function of key components in the circuit and how the attention pattern dynamics contribute to the overshadowing phenomenon and its evolution throughout the training process. Extensive experiments demonstrate PhantomCircuit's effectiveness in identifying such instances, offering novel insights into this elusive hallucination and providing the research community with a new methodological lens for its potential mitigation.

View on arXiv PDF

Similar