MLOct 21, 2016

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

arXiv:1610.06665v1169 citations
Originality Incremental advance
AI Analysis

This work provides theoretical guarantees for faster and more accurate Bayesian inference in large-scale data settings, though it is incremental as it extends existing SG-MCMC theory.

The paper tackles the lack of theoretical convergence analysis for stochastic gradient MCMC algorithms with high-order integrators, showing that using a 2nd-order symmetric splitting integrator achieves an optimal mean square error convergence rate of L^{-4/5} at L iterations, compared to L^{-2/3} for 1st-order methods.

Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the {\em mean square error} (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes