PRLGMAOCJun 8, 2022

Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms

arXiv:2206.03792v67 citationsh-index: 16
Originality Highly original
AI Analysis

This work provides stronger theoretical foundations for widely used sampling algorithms in machine learning and statistics, potentially enabling more efficient Bayesian inference and optimization.

The paper tackles the problem of analyzing stochastic gradient-based sampling algorithms by exploiting the Central Limit Theorem structure of their noise, achieving improved convergence guarantees for Stochastic Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM). It proves the first stable KL divergence convergence rate for SGLD without uniform warm start, leading to superior first-order oracle complexity under milder assumptions, and significantly improves RBM guarantees by removing exponential dependence on horizon.

We consider stochastic approximations of sampling algorithms, such as Stochastic Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD). We observe that the noise introduced by the stochastic approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness this structure to absorb the stochastic approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms. For SGLD, we prove the first stable convergence rate in KL divergence without requiring uniform warm start, assuming the target density satisfies a Log-Sobolev Inequality. Our result implies superior first-order oracle complexity compared to prior works, under significantly milder assumptions. We also prove the first guarantees for SGLD under even weaker conditions such as Hölder smoothness and Poincare Inequality, thus bridging the gap between the state-of-the-art guarantees for LMC and SGLD. Our analysis motivates a new algorithm called covariance correction, which corrects for the additional noise introduced by the stochastic approximation by rescaling the strength of the diffusion. Finally, we apply our techniques to analyze RBM, and significantly improve upon the guarantees in prior works (such as removing exponential dependence on horizon), under minimal assumptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes