Quality of Uncertainty Quantification for Bayesian Neural Network Inference
This work addresses the problem of evaluating uncertainty quantification in Bayesian Neural Networks for researchers and practitioners, but it is incremental as it focuses on empirical comparison without introducing new methods.
The paper empirically compared the quality of predictive uncertainty estimates for 10 common Bayesian Neural Network inference methods on regression and classification tasks, finding that standard metrics like test log-likelihood can be misleading and that methods designed to capture posterior structure do not necessarily yield high-quality approximations.
Bayesian Neural Networks (BNNs) place priors over the parameters in a neural network. Inference in BNNs, however, is difficult; all inference methods for BNNs are approximate. In this work, we empirically compare the quality of predictive uncertainty estimates for 10 common inference methods on both regression and classification tasks. Our experiments demonstrate that commonly used metrics (e.g. test log-likelihood) can be misleading. Our experiments also indicate that inference innovations designed to capture structure in the posterior do not necessarily produce high quality posterior approximations.