LG AI CLNov 13, 2025

The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns

Elyes Hajji, Aymen Bouguerra, Fabio Arnez

arXiv:2511.10837v19.42 citationsProceedings of the AAAI Symposium Series

Originality Incremental advance

AI Analysis

This work addresses hallucination detection in LLMs for safety-critical applications, offering incremental improvements by leveraging attention patterns to better handle intrinsic hallucinations.

The paper tackles the problem of hallucinations in Large Language Models by differentiating between extrinsic and intrinsic types and evaluating detection methods, finding that sampling-based methods like Semantic Entropy are effective for extrinsic hallucinations but fail on intrinsic ones, while their attention-based method improves detection for intrinsic hallucinations.

Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations. While prior works have proposed confidence representation methods for hallucination detection, most of these approaches rely on computationally expensive sampling strategies and often disregard the distinction between hallucination types. In this work, we introduce a principled evaluation framework that differentiates between extrinsic and intrinsic hallucination categories and evaluates detection performance across a suite of curated benchmarks. In addition, we leverage a recent attention-based uncertainty quantification algorithm and propose novel attention aggregation strategies that improve both interpretability and hallucination detection performance. Our experimental findings reveal that sampling-based methods like Semantic Entropy are effective for detecting extrinsic hallucinations but generally fail on intrinsic ones. In contrast, our method, which aggregates attention over input tokens, is better suited for intrinsic hallucinations. These insights provide new directions for aligning detection strategies with the nature of hallucination and highlight attention as a rich signal for quantifying model uncertainty.

View on arXiv PDF

Similar