LG AIMay 13, 2022

Toward a Geometrical Understanding of Self-supervised Contrastive Learning

Romain Cosentino, Anirvan Sengupta, Salman Avestimehr, Mahdi Soltanolkotabi, Antonio Ortega, Ted Willke, Mariano Tepper

arXiv:2205.06926v215.118 citationsh-index: 40

Originality Incremental advance

AI Analysis

This addresses a key problem in SSL for researchers by providing a geometrical understanding to improve model robustness and interpretability, though it is incremental as it builds on existing SSL frameworks.

The paper investigates why the projector in self-supervised contrastive learning generalizes poorly compared to the encoder, finding that stronger data augmentations cause the projector to become invariant by projecting data into a low-dimensional space, which is a noisy estimate of the data manifold tangent plane.

Self-supervised learning (SSL) is currently one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Despite their success, the underlying geometry of these representations remains elusive, which obfuscates the quest for more robust, trustworthy, and interpretable models. In particular, mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. When used for transfer learning, the projector is discarded since empirical results show that its representation generalizes more poorly than the encoder's. In this paper, we investigate this curious phenomenon and analyze how the strength of the data augmentation policies affects the data embedding. We discover a non-trivial relation between the encoder, the projector, and the data augmentation strength: with increasingly larger augmentation policies, the projector, rather than the encoder, is more strongly driven to become invariant to the augmentations. It does so by eliminating crucial information about the data by learning to project it into a low-dimensional space, a noisy estimate of the data manifold tangent plane in the encoder representation. This analysis is substantiated through a geometrical perspective with theoretical and empirical results.

View on arXiv PDF

Similar