LG AIMay 18, 2025

AdaDim: Dimensionality Adaptation for SSL Representational Dynamics

Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib

arXiv:2505.12576v211.43 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses a key optimization challenge in SSL for machine learning practitioners, offering an incremental improvement by adaptively balancing training dynamics.

The paper tackles the problem of balancing dimensionality and mutual information in self-supervised learning (SSL) to improve representation quality, showing that AdaDim achieves up to 3% performance gains over baselines without using expensive techniques.

A key factor in effective Self-Supervised learning (SSL) is preventing dimensional collapse, where higher-dimensional representation spaces ($R$) span a lower-dimensional subspace. Therefore, SSL optimization strategies involve guiding a model to produce $R$ with a higher dimensionality ($H(R)$) through objectives that encourage decorrelation of features or sample uniformity in $R$. A higher $H(R)$ indicates that $R$ has greater feature diversity which is useful for generalization to downstream tasks. Alongside dimensionality optimization, SSL algorithms also utilize a projection head that maps $R$ into an embedding space $Z$. Recent work has characterized the projection head as a filter of noisy or irrelevant features from the SSL objective by reducing the mutual information $I(R;Z)$. Therefore, the current literature's view is that a good SSL representation space should have a high $H(R)$ and a low $I(R;Z)$. However, this view of SSL is lacking in terms of an understanding of the underlying training dynamics that influences the relationship between both terms. Our analysis shows that the best performing SSL models do not have the highest $H(R)$ nor the lowest $I(R;Z)$, but effectively arrive at a balance between both. To take advantage of this analysis, we introduce AdaDim, a training strategy that leverages SSL training dynamics by adaptively balancing between increasing $H(R)$ through feature decorrelation and sample uniformity as well as gradual regularization of $I(R;Z)$ as training progresses. We show performance improvements of up to 3% over common SSL baselines despite our method not utilizing expensive techniques such as queues, clustering, predictor networks, or student-teacher architectures.

View on arXiv PDF

Similar