MLLGSTJun 5, 2024

Posterior and variational inference for deep neural networks with heavy-tailed weights

arXiv:2406.03369v213 citations
AI Analysis

This provides a theoretical foundation for Bayesian deep learning with heavy-tailed priors, offering adaptive and efficient inference for practitioners in nonparametric regression and related fields, though it is incremental based on prior work.

The paper tackles the problem of Bayesian deep learning by introducing a heavy-tailed prior for neural network weights, achieving near-optimal minimax contraction rates adaptive to intrinsic dimension and smoothness without requiring hyperparameter sampling for architecture learning.

We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve automatic adaptation to smoothness, we introduce a simple Bayesian deep learning prior based on heavy-tailed weights and ReLU activation. We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates, simultaneously adaptive to both intrinsic dimension and smoothness of the underlying function, in a variety of contexts including nonparametric regression, geometric data and Besov spaces. While most works so far need a form of model selection built-in within the prior distribution, a key aspect of our approach is that it does not require to sample hyperparameters to learn the architecture of the network. We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes