MLLGFeb 26, 2024

On the connection between Noise-Contrastive Estimation and Contrastive Divergence

arXiv:2402.16688v13 citationsh-index: 3AISTATS
Originality Synthesis-oriented
AI Analysis

This work clarifies theoretical connections between popular estimation methods for unnormalized probabilistic models, potentially aiding researchers in machine learning and statistics.

The paper shows that two noise-contrastive estimation (NCE) criteria, ranking NCE and conditional NCE, are equivalent to maximum likelihood estimation methods and special cases of contrastive divergence, bridging these approaches and enabling extensions from their literature.

Noise-contrastive estimation (NCE) is a popular method for estimating unnormalised probabilistic models, such as energy-based models, which are effective for modelling complex data distributions. Unlike classical maximum likelihood (ML) estimation that relies on importance sampling (resulting in ML-IS) or MCMC (resulting in contrastive divergence, CD), NCE uses a proxy criterion to avoid the need for evaluating an often intractable normalisation constant. Despite apparent conceptual differences, we show that two NCE criteria, ranking NCE (RNCE) and conditional NCE (CNCE), can be viewed as ML estimation methods. Specifically, RNCE is equivalent to ML estimation combined with conditional importance sampling, and both RNCE and CNCE are special cases of CD. These findings bridge the gap between the two method classes and allow us to apply techniques from the ML-IS and CD literature to NCE, offering several advantageous extensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes