LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs
This work addresses reliability issues in LLMs for users requiring accurate information, representing a novel method rather than an incremental improvement.
The paper tackles the problem of faithfulness hallucinations in Large Language Models by proposing a surgical neural intervention framework called Lancet, which uses structural entropy to block hallucination propagation pathways, achieving significant performance improvements over state-of-the-art methods on benchmark datasets.
Large Language Models have revolutionized information processing, yet their reliability is severely compromised by faithfulness hallucinations. While current approaches attempt to mitigate this issue through node-level adjustments or coarse suppression, they often overlook the distributed nature of neural information, leading to imprecise interventions. Recognizing that hallucinations propagate through specific forward transmission pathways like an infection, we aim to surgically block this flow using precise structural analysis. To leverage this, we propose Lancet, a novel framework that achieves precise neural intervention by leveraging structural entropy and hallucination difference ratios. Lancet first locates hallucination-prone neurons via gradient-driven contrastive analysis, then maps their propagation pathways by minimizing structural entropy, and finally implements a hierarchical intervention strategy that preserves general model capabilities. Comprehensive evaluations across hallucination benchmark datasets demonstrate that Lancet significantly outperforms state-of-the-art methods, validating the effectiveness of our surgical approach to neural intervention.