Exact marginal prior distributions of finite Bayesian neural networks
This work addresses the theoretical gap in understanding finite Bayesian neural networks, which could outperform infinite-width models, by providing exact prior characterizations, though it is incremental in advancing mathematical foundations.
The authors derived exact solutions for the function space priors of finite fully-connected feedforward Bayesian neural networks, specifically for deep linear networks expressed in terms of the Meijer G-function and ReLU networks as mixtures of linear network priors, unifying previous descriptions of tail decay and large-width behavior.
Bayesian neural networks are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that finite Bayesian networks may outperform their infinite counterparts, but their non-Gaussian function space priors have been characterized only though perturbative approaches. Here, we derive exact solutions for the function space priors for individual input examples of a class of finite fully-connected feedforward Bayesian neural networks. For deep linear networks, the prior has a simple expression in terms of the Meijer $G$-function. The prior of a finite ReLU network is a mixture of the priors of linear networks of smaller widths, corresponding to different numbers of active units in each layer. Our results unify previous descriptions of finite network priors in terms of their tail decay and large-width behavior.