From Global to Local: A Scalable Benchmark for Local Posterior Sampling
This addresses a gap in theoretical guarantees for SGMCMC algorithms in machine learning, focusing on local sampling performance, though it is incremental as it provides a benchmark rather than a new method.
The paper tackles the problem of understanding how stochastic gradient MCMC algorithms interact with degenerate loss landscapes in neural networks, introducing a scalable benchmark for local posterior sampling and finding that RMSProp-preconditioned SGLD effectively represents local geometry in models with up to O(100M) parameters.
Degeneracy is an inherent feature of the loss landscape of neural networks, but it is not well understood how stochastic gradient MCMC (SGMCMC) algorithms interact with this degeneracy. In particular, current global convergence guarantees for common SGMCMC algorithms rely on assumptions which are likely incompatible with degenerate loss landscapes. In this paper, we argue that this gap requires a shift in focus from global to local posterior sampling, and, as a first step, we introduce a novel scalable benchmark for evaluating the local sampling performance of SGMCMC algorithms. We evaluate a number of common algorithms, and find that RMSProp-preconditioned SGLD is most effective at faithfully representing the local geometry of the posterior distribution. Although we lack theoretical guarantees about global sampler convergence, our empirical results show that we are able to extract non-trivial local information in models with up to O(100M) parameters.