CLLGApr 13, 2021

Understanding Hard Negatives in Noise Contrastive Estimation

arXiv:2104.06245v1737 citations
AI Analysis

This work addresses a foundational issue in contrastive learning for researchers, providing analytical justification for hard negatives, though it is incremental in building on existing methods.

The paper tackles the problem of understanding the role of hard negatives in noise contrastive estimation, showing theoretically and empirically that using the model distribution as negatives reduces bias and leads to strong results, such as improved performance on zero-shot entity linking.

The choice of negative examples is important in noise contrastive estimation. Recent works find that hard negatives -- highest-scoring incorrect examples under the model -- are effective in practice, but they are used without a formal justification. We develop analytical tools to understand the role of hard negatives. Specifically, we view the contrastive loss as a biased estimator of the gradient of the cross-entropy loss, and show both theoretically and empirically that setting the negative distribution to be the model distribution results in bias reduction. We also derive a general form of the score function that unifies various architectures used in text retrieval. By combining hard negatives with appropriate score functions, we obtain strong results on the challenging task of zero-shot entity linking.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes