ML LGFeb 22

Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

Ahmed Karim, Fatima Sheaib, Zein Khamis, Maggie Chlon, Jad Awada, Leon Chlon

arXiv:2602.19239v11.7h-index: 3

Originality Incremental advance

AI Analysis

This addresses a critical reliability issue in language models for users relying on accurate procedural execution, though it is incremental as it builds on existing error analysis frameworks.

The paper investigates procedural hallucinations in large language models, where models fail to report a value they computed earlier despite it being present in context, identifying readout-stage routing failures as the primary cause. They show that an oracle intervention restating the true binding near the query can nearly eliminate these failures, achieving near-perfect accuracy in long-context tasks.

Large language models can follow complex procedures yet fail at a seemingly trivial final step: reporting a value they themselves computed moments earlier. We study this phenomenon as \emph{procedural hallucination}: failure to execute a verifiable, prompt-grounded specification even when the correct value is present in context. In long-context binding tasks with a known single-token candidate set, we find that many errors are readout-stage routing failures. Specifically, failures decompose into Stage~2A (gating) errors, where the model does not enter answer mode, and Stage~2B (binding) errors, where it enters answer mode but selects the wrong candidate (often due to recency bias). In the hard regime, Stage~2B accounts for most errors across model families in our tasks (Table~1). On Stage~2B error trials, a linear probe on the final-layer residual stream recovers the correct value far above chance (e.g., 74\% vs.\ 2\% on Qwen2.5-3B; Table~2), indicating that the answer is encoded but not used. We formalize ``present but not used'' via available vs.\ used mutual information and pseudo-prior interventions, yielding output-computable diagnostics and information-budget certificates. Finally, an oracle checkpointing intervention that restates the true binding near the query can nearly eliminate Stage~2B failures at long distance (e.g., Qwen2.5-3B $0/400 \rightarrow 399/400$ at $k = 1024$; Table~8).

View on arXiv PDF

Similar