MLLGPRCOMEOct 2, 2020

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

arXiv:2010.01084v211 citations
AI Analysis

This work addresses a bottleneck in accelerating convergence for non-convex learning problems, such as in deep learning, by improving an existing method, making it incremental but impactful for practitioners.

The paper tackles the slow convergence of replica exchange stochastic gradient Langevin dynamics (reSGLD) in non-convex learning by reducing variance in noisy energy estimators, enabling more effective swaps. Theoretically, it shows exponential acceleration and tighter error bounds, and numerically, it achieves state-of-the-art results in optimization and uncertainty estimates on synthetic and image data.

Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps. Theoretically, we provide a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process; moreover, we consider a generalized Girsanov theorem which includes the change of Poisson measure to overcome the crude discretization based on the Gröwall's inequality and yields a much tighter error in the 2-Wasserstein ($\mathcal{W}_2$) distance. Numerically, we conduct extensive experiments and obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes