AICLMar 5

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

arXiv:2603.05498v17 citations
Originality Incremental advance
AI Analysis

This work clarifies the functional roles and causal relationship of two common Transformer phenomena, massive activations and attention sinks, which is important for researchers and practitioners working on Transformer architecture and interpretability.

This paper investigates massive activations and attention sinks in Transformer language models, finding that their frequent co-occurrence is an architectural artifact. Massive activations act as global implicit parameters by inducing persistent hidden representations, while attention sinks locally modulate attention outputs and bias heads towards short-range dependencies.

We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit parameters of the model. Attention sinks operate locally: they modulate attention outputs across heads and bias individual heads toward short-range dependencies. We identify the pre-norm configuration as the key choice that enables the co-occurrence, and show that ablating it causes the two phenomena to decouple.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes