ME CO MLDec 20, 2020

Trace-class Gaussian priors for Bayesian learning of neural networks with MCMC

arXiv:2012.10943v34.38 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the scalability and stability of Bayesian inference for neural networks, particularly for practitioners using MCMC in high-dimensional function spaces, by preventing acceptance probability degradation during mesh refinement.

This paper introduces a new Gaussian neural network prior for real-valued functions that scales more efficiently with domain dimension compared to Karhunen-Loève priors. The authors demonstrate that the induced posterior is amenable to Hilbert space MCMC, maintaining stable acceptance probabilities even with infinite network width, and apply it to Bayesian Reinforcement Learning.

This paper introduces a new neural network based prior for real valued functions on $\mathbb R^d$ which, by construction, is more easily and cheaply scaled up in the domain dimension $d$ compared to the usual Karhunen-Loève function space prior. The new prior is a Gaussian neural network prior, where each weight and bias has an independent Gaussian prior, but with the key difference that the variances decrease in the width of the network in such a way that the resulting function is \emph{almost surely} well defined in the limit of an infinite width network. We show that in a Bayesian treatment of inferring unknown functions, the induced posterior over functions is amenable to Monte Carlo sampling using Hilbert space Markov chain Monte Carlo (MCMC) methods. This type of MCMC is popular, e.g. in the Bayesian Inverse Problems literature, because it is stable under \emph{mesh refinement}, i.e. the acceptance probability does not shrink to $0$ as more parameters of the function's prior are introduced, even \emph{ad infinitum}. In numerical examples we demonstrate these stated competitive advantages over other function space priors. We also implement examples in Bayesian Reinforcement Learning to automate tasks from data and demonstrate, for the first time, stability of MCMC to mesh refinement for these type of problems.

View on arXiv PDF Code

Similar