Understanding Self-Supervised Learning via Latent Distribution Matching
Provides a unifying theoretical framework for SSL that clarifies assumptions and guides design of new methods, benefiting researchers in representation learning.
Self-supervised learning is unified under a latent distribution matching framework, which explains existing methods and enables derivation of a new Bayesian filtering model for time series, with proven identifiability of latent representations.
Self-supervised learning (SSL) excels at finding general-purpose latent representations from complex data, yet lacks a unifying theoretical framework that explains the diverse existing methods and guides the design of new ones. We cast SSL as latent distribution matching (LDM): learning representations that maximize their log-probability under an assumed latent model (alignment), while maximizing latent entropy to prevent collapse (uniformity). This view unifies independent component analysis with contrastive, non-contrastive, and predictive SSL methods, including stop gradient approaches. Leveraging LDM, we derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-based predictor for high-dimensional timeseries. We further prove that predictive LDM yields identifiable latent representations under mild assumptions, even with nonlinear predictors. Overall, LDM clarifies the assumptions behind established SSL methods and provides principled guidance for developing new approaches.