LGCLMLMay 6

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

arXiv:2605.048935.2h-index: 8
Predicted impact top 76% in LG · last 90 daysOriginality Highly original
AI Analysis

For researchers diagnosing and mitigating hallucinations in large language models, this work provides a theoretically grounded, interpretable diagnostic framework that distinguishes failure modes by polarity.

The paper proves that spectral diagnostics of the symmetric attention operator are blind to information-flow direction, introduces the asymmetry coefficient G as the unique direction parameter, and shows that a two-axis diagnostic (ϕ for capacity, G for direction) yields falsifiable polarity predictions that reverse between hallucination benchmarks, achieving LC-AUROC from 0.62 to 0.84 on models up to 8B parameters.

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $ϕ\ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($ϕ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes