Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Dominik Dahlem, Diego Maniloff, Mac Misiura

arXiv:2605.048935.2h-index: 8

Predicted impact top 76% in LG · last 90 daysOriginality Highly original

AI Analysis

For researchers diagnosing and mitigating hallucinations in large language models, this work provides a theoretically grounded, interpretable diagnostic framework that distinguishes failure modes by polarity.

The paper proves that spectral diagnostics of the symmetric attention operator are blind to information-flow direction, introduces the asymmetry coefficient G as the unique direction parameter, and shows that a two-axis diagnostic (ϕ for capacity, G for direction) yields falsifiable polarity predictions that reverse between hallucination benchmarks, achieving LC-AUROC from 0.62 to 0.84 on models up to 8B parameters.

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $ϕ\ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($ϕ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.

View on arXiv PDF

Similar