LGMLFeb 23, 2022

Wide Mean-Field Bayesian Neural Networks Ignore the Data

arXiv:2202.11670v126 citations
Originality Incremental advance
AI Analysis

This work addresses a theoretical limitation in approximate inference for Bayesian neural networks, which is incremental but important for researchers in Bayesian deep learning.

The paper demonstrates that mean-field variational inference in wide Bayesian neural networks with odd activation functions fails to model data, as the optimal variational posterior predictive distribution converges to the prior predictive distribution as width increases, with theoretical bounds that are currently too loose for practical use.

Bayesian neural networks (BNNs) combine the expressive power of deep learning with the advantages of Bayesian formalism. In recent years, the analysis of wide, deep BNNs has provided theoretical insight into their priors and posteriors. However, we have no analogous insight into their posteriors under approximate inference. In this work, we show that mean-field variational inference entirely fails to model the data when the network width is large and the activation function is odd. Specifically, for fully-connected BNNs with odd activation functions and a homoscedastic Gaussian likelihood, we show that the optimal mean-field variational posterior predictive (i.e., function space) distribution converges to the prior predictive distribution as the width tends to infinity. We generalize aspects of this result to other likelihoods. Our theoretical results are suggestive of underfitting behavior previously observered in BNNs. While our convergence bounds are non-asymptotic and constants in our analysis can be computed, they are currently too loose to be applicable in standard training regimes. Finally, we show that the optimal approximate posterior need not tend to the prior if the activation function is not odd, showing that our statements cannot be generalized arbitrarily.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes