AIMar 23

INTRYGUE: Induction-Aware Entropy Gating for Reliable RAG Uncertainty Estimation

Alexandra Bazarova, Andrei Volodichev, Daria Kotova, Alexey Zaytsev

arXiv:2603.2160748.5h-index: 3Has Code

AI Analysis

This addresses the challenge of hallucination detection in RAG systems for users relying on factual reliability, though it is incremental as it builds on existing entropy-based methods.

The paper tackled the problem of unreliable uncertainty estimation in retrieval-augmented generation (RAG) due to a mechanistic paradox, and proposed INTRYGUE, which improved uncertainty quantification by gating entropy based on induction head activations, achieving consistent performance gains across four benchmarks and six LLMs.

While retrieval-augmented generation (RAG) significantly improves the factual reliability of LLMs, it does not eliminate hallucinations, so robust uncertainty quantification (UQ) remains essential. In this paper, we reveal that standard entropy-based UQ methods often fail in RAG settings due to a mechanistic paradox. An internal "tug-of-war" inherent to context utilization appears: while induction heads promote grounded responses by copying the correct answer, they collaterally trigger the previously established "entropy neurons". This interaction inflates predictive entropy, causing the model to signal false uncertainty on accurate outputs. To address this, we propose INTRYGUE (Induction-Aware Entropy Gating for Uncertainty Estimation), a mechanistically grounded method that gates predictive entropy based on the activation patterns of induction heads. Evaluated across four RAG benchmarks and six open-source LLMs (4B to 13B parameters), INTRYGUE consistently matches or outperforms a wide range of UQ baselines. Our findings demonstrate that hallucination detection in RAG benefits from combining predictive uncertainty with interpretable, internal signals of context utilization.

View on arXiv PDF

Similar