AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC
This addresses the trade-off between bias and convergence speed in SGHMC for efficient sampling in Bayesian inference, particularly for large datasets, though it is an incremental improvement over existing methods.
The authors tackled the bias issue in stochastic gradient Hamiltonian Monte Carlo (SGHMC) by proposing AMAGOLD, a second-order SG-MCMC algorithm that uses infrequent Metropolis-Hastings corrections to remove bias without requiring diminishing step sizes, achieving convergence to the target distribution with a fixed step size and at most a constant factor slower than full-batch baselines.
Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm---AMAGOLD---that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD's effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.