LG MLAug 26, 2024

Function-Space MCMC for Bayesian Wide Neural Networks

Lucia Pezzetti, Stefano Favaro, Stefano Peluchetti

arXiv:2408.14325v46.41 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses computational bottlenecks in Bayesian deep learning for researchers and practitioners working with uncertainty quantification in wide neural networks, representing an incremental improvement in MCMC methods for this specific domain.

The paper tackles efficient Bayesian inference for wide neural networks by proposing function-space MCMC methods based on preconditioned Crank-Nicolson algorithms, showing that acceptance probabilities approach 1 as network width increases without stepsize tuning and demonstrating higher effective sample sizes and improved diagnostics compared to other samplers.

Bayesian Neural Networks represent a fascinating confluence of deep learning and probabilistic reasoning, offering a compelling framework for understanding uncertainty in complex predictive models. In this paper, we investigate the use of the preconditioned Crank-Nicolson algorithm and its Langevin version to sample from a reparametrised posterior distribution of the neural network's weights, as the widths grow larger. In addition to being robust in the infinite-dimensional setting, we prove that the acceptance probabilities of the proposed algorithms approach 1 as the width of the network increases, independently of any stepsize tuning. Moreover, we examine and compare how the mixing speeds of the underdamped Langevin Monte Carlo, the preconditioned Crank-Nicolson and the preconditioned Crank-Nicolson Langevin samplers are influenced by changes in the network width in some real-world cases. Our findings suggest that, in wide Bayesian Neural Networks configurations, the preconditioned Crank-Nicolson algorithm allows for a scalable and more efficient sampling of the reparametrised posterior distribution, as also evidenced by a higher effective sample size and improved diagnostic results compared with the other analysed algorithms.

View on arXiv PDF Code

Similar