Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
This addresses a key bottleneck in self-supervised learning for computer vision and multimodal applications, offering a more efficient and scalable solution compared to local methods.
The paper tackles the problem of false negatives in self-supervised contrastive learning, where semantically similar samples are incorrectly treated as negatives, by introducing GloFND, an optimization-based method that automatically learns thresholds to identify false negatives globally across the dataset, resulting in improved performance on image and image-text data.
In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.