Understanding Deep Contrastive Learning via Coordinate-wise Optimization
This provides a theoretical framework for contrastive learning that could aid in designing new loss functions, though it is incremental in advancing understanding rather than solving a broad practical problem.
The paper tackles the problem of understanding deep contrastive learning by proposing a unified coordinate-wise optimization formulation called α-CL, which unifies existing contrastive losses and enables the design of novel losses that achieve comparable or better performance on datasets like CIFAR10, STL-10, and CIFAR-100 compared to InfoNCE.
We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter $\boldsymbolθ$ and pairwise importance $α$, where the \emph{max player} $\boldsymbolθ$ learns representation for contrastiveness, and the \emph{min player} $α$ puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called $α$-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance $α$ is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed $α$, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.