A Minimal Model of Representation Collapse: Frustration, Stop-Gradient, and Dynamics

arXiv:2604.0997992.0h-index: 4

AI Analysis

For researchers studying self-supervised learning, this work provides a theoretical understanding of collapse mechanisms and prevention, though the model is highly simplified.

The paper introduces a minimal model to analyze representation collapse in self-supervised learning, showing that frustration from unclassifiable samples induces collapse, while stop-gradient prevents it. The model's dynamics are analytically tractable and verified in a linear teacher-student setting.

Self-supervised representation learning is central to modern machine learning because it extracts structured latent features from unlabeled data and enables robust transfer across tasks and domains. However, it can suffer from representation collapse, a widely observed failure mode in which embeddings lose discriminative structure and distinct inputs become indistinguishable. To understand the mechanisms that drive collapse and the ingredients that prevent it, we introduce a minimal embedding-only model whose gradient-flow dynamics and fixed points can be analyzed in closed form, using a classification-representation setting as a concrete playground where collapse is directly quantified through the contraction of label-embedding geometry. We illustrate that the model does not collapse when the data are perfectly classifiable, while a small fraction of frustrated samples that cannot be classified consistently induces collapse through an additional slow time scale that follows the early performance gain. Within the same framework, we examine collapse prevention by adding a shared projection head and applying stop-gradient at the level of the training dynamics. We analyze the resulting fixed points and develop a dynamical mean-field style self-consistency description, showing that stop-gradient enables non-collapsed solutions and stabilizes finite class separation under frustration. We further verify empirically that the same qualitative dynamics and collapse-prevention effects appear in a linear teacher-student model, indicating that the minimal theory captures features that persist beyond the pure embedding setting.

View on arXiv PDF

Similar