Bayesian Neural Network Priors Revisited
This work addresses the problem of improving Bayesian neural network priors for researchers and practitioners, offering incremental enhancements based on empirical weight analysis.
The authors tackled the problem of suboptimal isotropic Gaussian priors in Bayesian neural networks by analyzing weight statistics from SGD-trained networks, finding spatial correlations in CNNs and heavy-tailed distributions in FCNNs. They showed that incorporating these observations into priors improves performance on image classification datasets, with priors mitigating the cold posterior effect in FCNNs but slightly increasing it in ResNets.
Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, it is unclear whether these priors accurately reflect our true beliefs about the weight distributions or give optimal performance. To find better priors, we study summary statistics of neural network weights in networks trained using stochastic gradient descent (SGD). We find that convolutional neural network (CNN) and ResNet weights display strong spatial correlations, while fully connected networks (FCNNs) display heavy-tailed weight distributions. We show that building these observations into priors can lead to improved performance on a variety of image classification datasets. Surprisingly, these priors mitigate the cold posterior effect in FCNNs, but slightly increase the cold posterior effect in ResNets.