LGMLOct 22, 2024

Bayes without Underfitting: Fully Correlated Deep Learning Posteriors via Alternating Projections

arXiv:2410.16901v14 citationsh-index: 8AISTATS
Originality Highly original
AI Analysis

This addresses the accuracy-uncertainty trade-off in Bayesian deep learning for practitioners, offering a scalable solution to prevent underfitting in large models.

The paper tackles the problem of Bayesian deep learning underfitting, where Bayesian predictions are less accurate than point estimates, by proposing a method to build Bayesian approximations in the null space of the generalized Gauss-Newton matrix to guarantee no underfitting. The result includes a scalable algorithm with linear scaling in parameters and quadratic in output dimensions, demonstrated on large models like vision transformers with 28 million parameters.

Bayesian deep learning all too often underfits so that the Bayesian prediction is less accurate than a simple point estimate. Uncertainty quantification then comes at the cost of accuracy. For linearized models, the null space of the generalized Gauss-Newton matrix corresponds to parameters that preserve the training predictions of the point estimate. We propose to build Bayesian approximations in this null space, thereby guaranteeing that the Bayesian predictive does not underfit. We suggest a matrix-free algorithm for projecting onto this null space, which scales linearly with the number of parameters and quadratically with the number of output dimensions. We further propose an approximation that only scales linearly with parameters to make the method applicable to generative models. An extensive empirical evaluation shows that the approach scales to large models, including vision transformers with 28 million parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes