Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation
This work addresses a foundational issue in machine learning for researchers and practitioners using unnormalized probabilistic models, though it is incremental as it builds on existing NCE methods.
The paper tackles the problem of noise-contrastive estimation (NCE) performing poorly with inappropriate noise distributions by proving that this arises from a flat loss landscape, and introduces eNCE, a variant using an exponential loss, which provably addresses these issues with normalized gradient descent for exponential family distributions.
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. However, such observations have never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribution are statistical or algorithmic in nature. In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used. Namely, we prove these challenges arise due to an ill-behaved (more precisely, flat) loss landscape. To address this, we introduce a variant of NCE called "eNCE" which uses an exponential loss and for which normalized gradient descent addresses the landscape issues provably when the target and noise distributions are in a given exponential family.