Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning
This addresses an efficiency bottleneck in contrastive learning for researchers and practitioners working with large datasets, though it is incremental as it builds on existing hashing techniques.
The paper tackles the computational challenge of efficiently finding high-quality hard negative examples in contrastive learning for large, high-dimensional datasets by proposing a GPU-friendly Locality-Sensitive Hashing scheme, achieving comparable or better performance with significantly less computation than existing methods.
Contrastive learning is a representational learning paradigm in which a neural network maps data elements to feature vectors. It improves the feature space by forming lots with an anchor and examples that are either positive or negative based on class similarity. Hard negative examples, which are close to the anchor in the feature space but from a different class, improve learning performance. Finding such examples of high quality efficiently in large, high-dimensional datasets is computationally challenging. In this paper, we propose a GPU-friendly Locality-Sensitive Hashing (LSH) scheme that quantizes real-valued feature vectors into binary representations for approximate nearest neighbor search. We investigate its theoretical properties and evaluate it on several datasets from textual and visual domain. Our approach achieves comparable or better performance while requiring significantly less computation than existing hard negative mining strategies.