LGMLOct 1, 2022

Pitfalls of Gaussians as a noise distribution in NCE

arXiv:2210.00189v27 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work highlights a critical pitfall for practitioners using NCE in machine learning, suggesting the need for more complex noise distributions to ensure efficiency.

The paper demonstrates that using a Gaussian noise distribution matching the data's mean and covariance in Noise Contrastive Estimation (NCE) can lead to exponentially poor conditioning of the loss Hessian in high dimensions, even for simple distributions, resulting in problematic statistical and algorithmic complexity.

Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easy-to-sample noise distribution $q$, in a manner that avoids having to calculate a partition function. It is well-known that the choice of $q$ can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of $q$ will be problematic in practice, suggesting that more complex noise distributions are essential to the success of NCE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes