Scalable Natural Gradient Langevin Dynamics in Practice
This work addresses a bottleneck in Bayesian modeling for large datasets and models, but appears incremental as it compares existing preconditioning methods without introducing a new paradigm.
The paper tackled the problem of poor convergence and mixing times in Stochastic Gradient Langevin Dynamics (SGLD) due to independent and uniform noise scaling, by comparing different preconditioning approaches to normalize the noise vector. The result included benchmarking on criteria such as mixing times, regularization, covariate shift detection, and resistance to adversarial examples, though no concrete numbers were provided in the abstract.
Stochastic Gradient Langevin Dynamics (SGLD) is a sampling scheme for Bayesian modeling adapted to large datasets and models. SGLD relies on the injection of Gaussian Noise at each step of a Stochastic Gradient Descent (SGD) update. In this scheme, every component in the noise vector is independent and has the same scale, whereas the parameters we seek to estimate exhibit strong variations in scale and significant correlation structures, leading to poor convergence and mixing times. We compare different preconditioning approaches to the normalization of the noise vector and benchmark these approaches on the following criteria: 1) mixing times of the multivariate parameter vector, 2) regularizing effect on small dataset where it is easy to overfit, 3) covariate shift detection and 4) resistance to adversarial examples.