LG CVJun 2

IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension

Julie Mordacq, Vicky Kalogeiton, Steve Oudot

arXiv:2606.0333832.0h-index: 2

AI Analysis

For researchers using SSL, this provides a cheaper, hyperparameter-free geometric proxy to assess representation quality, though it is incremental as it applies known ID estimation to SSL evaluation.

The authors propose IdEst, a method using intrinsic dimension (ID) estimated via Minimum Spanning Tree to evaluate self-supervised learning (SSL) representations. They show that IdEst strongly correlates with linear probe performance across datasets and architectures, enabling efficient hyperparameter selection with reduced computational cost.

Self-supervised learning (SSL) has emerged as a powerful paradigm for learning meaningful representations from unlabeled data. However, the standard protocol for evaluating these representations, linear probing, is computationally expensive, sensitive to hyperparameters, and provides limited insight into the geometric structure of the representation space. In this work, motivated by connections between neural network generalization and intrinsic dimension (ID) we propose IdEst, a method for estimating the ID of SSL representations via the Minimum Spanning Tree dimension estimator ($\mathrm{dim}_\mathrm{MST}$). Across diverse datasets, architectures, and SSL pretraining objectives, we show that IdEst strongly correlates with downstream linear probe performances. Furthermore, we demonstrate that IdEst enables efficient hyperparameter selection, significantly reducing the computational cost compared to supervised alternatives. Our results highlight intrinsic dimensionality as a principled geometric proxy for assessing SSL representations, complementing standard supervised probing protocols.

View on arXiv PDF

Similar