Near-Optimality of Contrastive Divergence Algorithms
This provides a theoretical improvement for researchers in machine learning, specifically in probabilistic modeling, though it is incremental as it builds on prior asymptotic results.
The paper tackles the convergence rate of the contrastive divergence algorithm for training unnormalized models, showing it can achieve the parametric rate O(n^{-1/2}) under regularity assumptions, which is near-optimal as its asymptotic variance approaches the Cramér-Rao lower bound.
We perform a non-asymptotic analysis of the contrastive divergence (CD) algorithm, a training method for unnormalized models. While prior work has established that (for exponential family distributions) the CD iterates asymptotically converge at an $O(n^{-1 / 3})$ rate to the true parameter of the data distribution, we show, under some regularity assumptions, that CD can achieve the parametric rate $O(n^{-1 / 2})$. Our analysis provides results for various data batching schemes, including the fully online and minibatch ones. We additionally show that CD can be near-optimal, in the sense that its asymptotic variance is close to the Cramér-Rao lower bound.