LGOct 4, 2022

Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions

DeepMindU of Toronto
arXiv:2210.01883v231 citationsh-index: 30
AI Analysis

This provides theoretical insights into contrastive learning for self-supervised representation learning, but it is incremental as it builds on existing methods and focuses on synthetic analysis.

The paper shows that contrastive learning methods can be reinterpreted as learning kernel functions approximating a fixed positive-pair kernel, and proves that combining this kernel with PCA minimizes worst-case linear prediction error under an assumption of similar labels for positive pairs, with empirical validation on synthetic tasks.

Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels. Our analysis is based on a decomposition of the target function in terms of the eigenfunctions of a positive-pair Markov chain, and a surprising equivalence between these eigenfunctions and the output of Kernel PCA. We give generalization bounds for downstream linear prediction using our Kernel PCA representation, and show empirically on a set of synthetic tasks that applying Kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes