AI CLFeb 14, 2024

Spectral Filters, Dark Signals, and Attention Sinks

arXiv:2402.09221v149 citationsh-index: 4ACL

Originality Incremental advance

AI Analysis

This work provides a quantitative interpretation tool for LLMs, which is incremental as it builds on existing methods like the logit lens and attention sinking.

The authors tackled the problem of interpreting transformer-based LLMs by extending the logit lens with spectral filters on intermediate representations, finding that attention sinking is linked to signals in the tail end of the spectrum and that suppressing parts of the spectrum preserves loss if attention sinking is maintained.

Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.

View on arXiv PDF

Similar