Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers
For theorists studying Transformer dynamics, this work provides a first step toward a training-aware mean-field theory, though it is limited to a simplified setting with only a parameter-linear FFN and L2 regularization.
The paper studies how training modifies the clustering dynamics of tokens in Transformers, showing that training can cause tokens to escape clustering near final layers, contrary to previous mean-field theories that assumed fixed parameters. This is demonstrated analytically using an entropy-regularized interaction energy.
Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions toward clustering. However, existing mean-field analyses largely treat model parameters as prescribed, leaving open how training reshapes this clustering picture. We study this question in a noisy mean-field Transformer in which only a parameter-linear FFN is trained under $L^2$ regularization. We find and analyze a training-induced phase in the dynamics: after initially following attention-driven clustering, the token distribution can leave the clustered regime near the final layers. Our mathematical analysis is based on an entropy-regularized interaction energy that captures the clustering bias of attention. More broadly, our results point toward a training-aware mean-field theory of Transformer dynamics, in which training and inference dynamics are treated together.