MLLGJun 21, 2024

Flat Posterior Does Matter For Bayesian Model Averaging

arXiv:2406.15664v52 citations
AI Analysis

This addresses a specific bottleneck in Bayesian neural networks for researchers and practitioners, but it is incremental as it builds on existing BMA methods by focusing on flatness.

The paper tackles the problem that Bayesian Model Averaging (BMA) in neural networks often overlooks posterior flatness, which can hinder generalization, and proposes FP-BMA to encourage flat posteriors, resulting in improved generalization performance as demonstrated empirically.

Bayesian neural networks (BNNs) estimate the posterior distribution of model parameters and utilize posterior samples for Bayesian Model Averaging (BMA) in prediction. However, despite the crucial role of flatness in the loss landscape in improving the generalization of neural networks, its impact on BMA has been largely overlooked. In this work, we explore how posterior flatness influences BMA generalization and empirically demonstrate that (1) most approximate Bayesian inference methods fail to yield a flat posterior and (2) BMA predictions, without considering posterior flatness, are less effective at improving generalization. To address this, we propose Flat Posterior-aware Bayesian Model Averaging (FP-BMA), a novel training objective that explicitly encourages flat posteriors in a principled Bayesian manner. We also introduce a Flat Posterior-aware Bayesian Transfer Learning scheme that enhances generalization in downstream tasks. Empirically, we show that FP-BMA successfully captures flat posteriors, improving generalization performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes