LGJul 20, 2023

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

Apple
arXiv:2307.10907v215 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This work provides a theoretical framework for MVSSL, offering insights into existing methods and enhancing stability, which is incremental but useful for researchers in self-supervised learning.

The paper tackles the problem of understanding the mechanisms behind multi-view self-supervised learning (MVSSL) by introducing an entropy and reconstruction (ER) bound on mutual information, showing that it explains clustering-based methods like DeepCluster and SwAV as maximizing mutual information and distillation-based approaches like BYOL and DINO as explicitly maximizing reconstruction with stable entropy, achieving competitive performance and improved stability with smaller batch sizes or EMA coefficients.

The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes