LG MLOct 9, 2018

Deterministic Variational Inference for Robust Bayesian Neural Networks

Anqi Wu, Sebastian Nowozin, Edward Meeds, Richard E. Turner, José Miguel Hernández-Lobato, Alexander L. Gaunt

arXiv:1810.03958v226.4102 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the limited practical use of variational Bayes for Bayesian neural networks, offering a more robust inference tool for applications like heteroscedastic regression.

The paper tackled the fragility of variational Bayes in Bayesian neural networks by introducing a deterministic moment approximation to eliminate gradient variance and a hierarchical prior with Empirical Bayes for automatic variance selection, resulting in a highly efficient and robust method that demonstrated good predictive performance in heteroscedastic regression.

Bayesian neural networks (BNNs) hold great promise as a flexible and principled solution to deal with uncertainty when learning from finite data. Among approaches to realize probabilistic inference in deep neural networks, variational Bayes (VB) is theoretically grounded, generally applicable, and computationally efficient. With wide recognition of potential advantages, why is it that variational Bayes has seen very limited practical use for BNNs in real applications? We argue that variational inference in neural networks is fragile: successful implementations require careful initialization and tuning of prior variances, as well as controlling the variance of Monte Carlo gradient estimates. We provide two innovations that aim to turn VB into a robust inference tool for Bayesian neural networks: first, we introduce a novel deterministic method to approximate moments in neural networks, eliminating gradient variance; second, we introduce a hierarchical prior for parameters and a novel Empirical Bayes procedure for automatically selecting prior variances. Combining these two innovations, the resulting method is highly efficient and robust. On the application of heteroscedastic regression we demonstrate good predictive performance over alternative approaches.

View on arXiv PDF Code

Similar