LG STNov 4, 2021

Hard Negative Sampling via Regularized Optimal Transport for Contrastive Representation Learning

Ruijie Jiang, Prakash Ishwar, Shuchin Aeron

arXiv:2111.03169v35.510 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in contrastive learning for machine learning practitioners by providing a theoretical justification and improved method for negative sampling, though it is incremental as it builds on existing optimal transport approaches.

The paper tackles the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning by proposing a min-max framework and using regularized optimal transport to control negative example hardness, showing that the generated negative samples are more similar to anchors and improving downstream task performance with a new ground cost.

We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning. We propose and analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss over all couplings (joint distributions between positive and negative samples subject to marginal constraints) and prove that the resulting min-max optimum representation will be degenerate. This provides the first theoretical justification for incorporating additional regularization constraints on the couplings. We re-interpret the min-max problem through the lens of Optimal Transport (OT) theory and utilize regularized transport couplings to control the degree of hardness of negative examples. Through experiments we demonstrate that the negative samples generated from our designed negative distribution are more similar to the anchor than those generated from the baseline negative distribution. We also demonstrate that entropic regularization yields negative sampling distributions with parametric form similar to that in a recent state-of-the-art negative sampling design and has similar performance in multiple datasets. Utilizing the uncovered connection with OT, we propose a new ground cost for designing the negative distribution and show improved performance of the learned representation on downstream tasks compared to the representation learned when using squared Euclidean cost.

View on arXiv PDF Code

Similar