PRLGDSMar 6

Random Quadratic Form on a Sphere: Synchronization by Common Noise

arXiv:2603.06187v12 citationsh-index: 4
Predicted impact top 14% in PR · last 90 daysOriginality Incremental advance
AI Analysis

This work offers an alternative explanation for token clustering in deep transformers, suggesting that common noise in linear layers can induce synchronization and clustering even without self-attention, which is significant for researchers studying transformer architectures.

This paper introduces the Random Quadratic Form (RQF), a stochastic differential equation modeling the gradient flow of a random quadratic functional on a sphere. The authors demonstrate that while the one-point dynamics are Brownian, the two-point motion exhibits synchronizing behavior, providing both distributional and path-wise characterizations of solutions through invariant measures and random attractors.

We introduce the Random Quadratic Form (RQF): a stochastic differential equation which formally corresponds to the gradient flow of a random quadratic functional on a sphere. While the one-point dynamics of the system is a Brownian motion and thus has no preferred direction, the two-point motion exhibits nontrivial synchronizing behaviour. In this work we study synchronization of the RQF, namely we give both distributional and path-wise characterizations of the solutions by studying invariant measures and random attractors of the system. The RQF model is motivated by the study of the role of linear layers in transformers and illustrates the synchronization by common noise phenomena arising in the simplified models of transformers. In particular, we provide an alternative (independent of self-attention) explanation of the clustering behaviour in deep transformers and show that tokens cluster even in the absence of the self-attention mechanism.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes