AI CL CYDec 1, 2025

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Cheng Gao, Huimin Chen, Chaojun Xiao, Zhiyi Chen, Zhiyuan Liu, Maosong Sun

arXiv:2512.01797v220.813 citationsh-index: 41

Originality Incremental advance

AI Analysis

This addresses the reliability issue in LLMs for users by providing microscopic insights into hallucination mechanisms, though it is incremental as it builds on prior macroscopic studies.

The paper tackled the problem of hallucinations in large language models by identifying a sparse subset of neurons (less than 0.1% of total) that predict hallucination occurrences and are causally linked to over-compliance behaviors, with these neurons originating from pre-training.

Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives, the underlying neuron-level mechanisms remain largely unexplored. In this paper, we conduct a systematic investigation into hallucination-associated neurons (H-Neurons) in LLMs from three perspectives: identification, behavioral impact, and origins. Regarding their identification, we demonstrate that a remarkably sparse subset of neurons (less than $0.1\%$ of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios. In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors. Concerning their origins, we trace these neurons back to the pre-trained base models and find that these neurons remain predictive for hallucination detection, indicating they emerge during pre-training. Our findings bridge macroscopic behavioral patterns with microscopic neural mechanisms, offering insights for developing more reliable LLMs.

View on arXiv PDF

Similar