LG CVOct 13, 2021

Decoupled Contrastive Learning

Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu, Yubei Chen, Yann LeCun

arXiv:2110.06848v331.3253 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of high computational costs in contrastive learning for self-supervised learning researchers, offering an incremental improvement in efficiency and performance.

The paper tackles the computational inefficiency of contrastive learning by identifying a negative-positive-coupling effect in the InfoNCE loss and proposes a decoupled contrastive learning (DCL) loss that improves learning efficiency. DCL achieves competitive performance, such as 68.2% ImageNet-1K top-1 accuracy with batch size 256 in 200 epochs, outperforming SimCLR by 6.4%, and sets a new SOTA of 72.3% with NNCLR.

Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, including large sample batches, extensive training epochs, etc. We are thus motivated to tackle these issues and establish a simple, efficient, yet competitive baseline of contrastive learning. Specifically, we identify, from theoretical and empirical studies, a noticeable negative-positive-coupling (NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning efficiency concerning the batch size. By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the denominator and significantly improves the learning efficiency. DCL achieves competitive performance with less sensitivity to sub-optimal hyperparameters, requiring neither large batches in SimCLR, momentum encoding in MoCo, or large epochs. We demonstrate with various benchmarks while manifesting robustness as much less sensitive to suboptimal hyperparameters. Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by 6.4%. Further, DCL can be combined with the SOTA contrastive learning method, NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400 epochs, which represents a new SOTA in contrastive learning. We believe DCL provides a valuable baseline for future contrastive SSL studies.

View on arXiv PDF

Similar