LGMLJan 29, 2020

The Case for Bayesian Deep Learning

arXiv:2001.10995v1121 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving reliability and performance in deep learning for researchers and practitioners, though it is incremental by synthesizing existing arguments and evidence.

The paper argues that Bayesian deep learning, which uses marginalization over parameters instead of optimization, is particularly effective for neural networks due to their underspecification and structural properties, leading to improvements in accuracy and calibration compared to standard training methods.

The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes