LG MLOct 1, 2022

Pitfalls of Gaussians as a noise distribution in NCE

Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski

arXiv:2210.00189v29.67 citationsh-index: 26

Originality Incremental advance

AI Analysis

This work highlights a critical pitfall for practitioners using NCE in machine learning, suggesting the need for more complex noise distributions to ensure efficiency.

The paper demonstrates that using a Gaussian noise distribution matching the data's mean and covariance in Noise Contrastive Estimation (NCE) can lead to exponentially poor conditioning of the loss Hessian in high dimensions, even for simple distributions, resulting in problematic statistical and algorithmic complexity.

Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easy-to-sample noise distribution $q$, in a manner that avoids having to calculate a partition function. It is well-known that the choice of $q$ can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of $q$ will be problematic in practice, suggesting that more complex noise distributions are essential to the success of NCE.

View on arXiv PDF

Similar