LGSYDSNov 14, 2025

Multistability of Self-Attention Dynamics in Transformers

arXiv:2511.11553v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the theoretical analysis of transformer attention mechanisms for researchers in machine learning, but it appears incremental as it builds on existing dynamical systems models.

The paper tackled the problem of understanding the equilibria in self-attention dynamics of transformers, showing that these dynamics relate to a multiagent Oja flow and classifying equilibria into four classes, with multiple asymptotically stable equilibria often coexisting.

In machine learning, a self-attention dynamics is a continuous-time multiagent-like model of the attention mechanisms of transformers. In this paper we show that such dynamics is related to a multiagent version of the Oja flow, a dynamical system that computes the principal eigenvector of a matrix corresponding for transformers to the value matrix. We classify the equilibria of the ``single-head'' self-attention system into four classes: consensus, bipartite consensus, clustering and polygonal equilibria. Multiple asymptotically stable equilibria from the first three classes often coexist in the self-attention dynamics. Interestingly, equilibria from the first two classes are always aligned with the eigenvectors of the value matrix, often but not exclusively with the principal eigenvector.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes