Dual Perspectives on Non-Contrastive Self-Supervised Learning
This work provides theoretical insights into avoiding collapse in self-supervised learning, which is crucial for improving representation learning in AI, though it is incremental as it builds on existing methods.
The paper investigates the stop gradient and exponential moving average procedures in non-contrastive self-supervised learning, showing that they avoid representation collapse without optimizing the original objective, and proves that in linear cases, these procedures lead to asymptotically stable equilibria while the original objective always collapses.
The {\em stop gradient} and {\em exponential moving average} iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in general, although they {\em do not} optimize the original objective, or {\em any} other smooth function, they {\em do} avoid collapse Following~\citet{Tian21}, but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average {\em always} leads to collapse. Conversely, we characterize explicitly the equilibria of the dynamical systems associated with these two procedures in this linear setting as algebraic varieties in their parameter space, and show that they are, in general, {\em asymptotically stable}. Our theoretical findings are illustrated by empirical experiments with real and synthetic data.