Peter A. Whalley

ML
h-index41
3papers
16citations
Novelty37%
AI Score38

3 Papers

85.4COApr 27
Theoretical guarantees for stochastic gradient sampling methods via Gaussian convolution inequalities

Daniel Paulin, Peter A. Whalley

We derive first-order (in the stepsize) bounds on the bias in Wasserstein distances of the invariant measure of stochastic gradient kinetic Langevin dynamics with minimal assumptions on the stochastic gradient noise. These bounds sharpen existing non-asymptotic guarantees for stochastic-gradient MCMC methods and provide a quantitative resolution of a previously open problem on invariant measure accuracy. The main technical ingredients are new Gaussian convolution inequalities controlling the Wasserstein-$p$ distance between a Gaussian convolved with a mean-zero perturbation and the Gaussian itself. We anticipate that these inequalities will be of independent interest beyond the present application.

MLOct 14, 2024
Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics

Daniel Paulin, Peter A. Whalley, Neil K. Chada et al.

We propose a scalable kinetic Langevin dynamics algorithm for sampling parameter spaces of big data and AI applications. Our scheme combines a symmetric forward/backward sweep over minibatches with a symmetric discretization of Langevin dynamics. For a particular Langevin splitting method (UBU), we show that the resulting Symmetric Minibatch Splitting-UBU (SMS-UBU) integrator has bias $O(h^2 d^{1/2})$ in dimension $d>0$ with stepsize $h>0$, despite only using one minibatch per iteration, thus providing excellent control of the sampling bias as a function of the stepsize. We apply the algorithm to explore local modes of the posterior distribution of Bayesian neural networks (BNNs) and evaluate the calibration performance of the posterior predictive probabilities for neural networks with convolutional neural network architectures for classification problems on three different datasets (Fashion-MNIST, Celeb-A and chest X-ray). Our results indicate that BNNs sampled with SMS-UBU can offer significantly better calibration performance compared to standard methods of training and stochastic weight averaging.

MLFeb 13, 2024
Correction to "Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations"

Daniel Paulin, Peter A. Whalley

A method for analyzing non-asymptotic guarantees of numerical discretizations of ergodic SDEs in Wasserstein-2 distance is presented by Sanz-Serna and Zygalakis in ``Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations". They analyze the UBU integrator which is strong order two and only requires one gradient evaluation per step, resulting in desirable non-asymptotic guarantees, in particular $\mathcal{O}(d^{1/4}ε^{-1/2})$ steps to reach a distance of $ε> 0$ in Wasserstein-2 distance away from the target distribution. However, there is a mistake in the local error estimates in Sanz-Serna and Zygalakis (2021), in particular, a stronger assumption is needed to achieve these complexity estimates. This note reconciles the theory with the dimension dependence observed in practice in many applications of interest.