LGCOJun 30, 2021

Revisiting the Effects of Stochasticity for Hamiltonian Samplers

arXiv:2106.16200v23 citations
Originality Incremental advance
AI Analysis

This work addresses the convergence bottleneck in Hamiltonian samplers for Bayesian inference, particularly in machine learning tasks like regression and classification, but it is incremental as it revises prior theoretical results.

The paper revisits the theoretical properties of Hamiltonian stochastic differential equations for Bayesian posterior sampling, analyzing discretization and gradient noise errors from data subsampling, and finds that with mini-batches, the best achievable error rate is O(η²) where η is the step size.

We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(η^2)$, with $η$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes