LGMLOct 5, 2020

Conditional Negative Sampling for Contrastive Learning of Visual Representations

arXiv:2010.02037v189 citations
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in unsupervised visual representation learning for computer vision applications, offering an incremental improvement over existing contrastive learning methods.

The paper tackles the problem of improving contrastive learning for visual representations by selecting more difficult negative examples, showing that this approach yields stronger representations. The method improves accuracy by 2-5% points on existing models across four standard image datasets and shows benefits in transfer learning to new distributions and downstream tasks.

Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative examples to normalize the objective. In this paper, we show that choosing difficult negatives, or those more similar to the current instance, can yield stronger representations. To do this, we introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive. We prove that these estimators lower-bound mutual information, with higher bias but lower variance than NCE. Experimentally, we find our approach, applied on top of existing models (IR, CMC, and MoCo) improves accuracy by 2-5% points in each case, measured by linear evaluation on four standard image datasets. Moreover, we find continued benefits when transferring features to a variety of new image distributions from the Meta-Dataset collection and to a variety of downstream tasks such as object detection, instance segmentation, and keypoint detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes