LGMLMar 17, 2025

On Local Posterior Structure in Deep Ensembles

arXiv:2503.13296v1h-index: 3Has CodeAISTATS
Originality Incremental advance
AI Analysis

This work addresses the calibration and uncertainty quantification problem in machine learning, revealing counterintuitive trade-offs between in-distribution and out-of-distribution performance for practitioners using ensemble methods.

The paper investigates deep ensembles of Bayesian Neural Networks (DE-BNNs) and finds that, contrary to expectations, large deep ensembles (DEs) consistently outperform DE-BNNs on in-distribution data, while DE-BNNs show better out-of-distribution performance at the cost of in-distribution accuracy.

Bayesian Neural Networks (BNNs) often improve model calibration and predictive uncertainty quantification compared to point estimators such as maximum-a-posteriori (MAP). Similarly, deep ensembles (DEs) are also known to improve calibration, and therefore, it is natural to hypothesize that deep ensembles of BNNs (DE-BNNs) should provide even further improvements. In this work, we systematically investigate this across a number of datasets, neural network architectures, and BNN approximation methods and surprisingly find that when the ensembles grow large enough, DEs consistently outperform DE-BNNs on in-distribution data. To shine light on this observation, we conduct several sensitivity and ablation studies. Moreover, we show that even though DE-BNNs outperform DEs on out-of-distribution metrics, this comes at the cost of decreased in-distribution performance. As a final contribution, we open-source the large pool of trained models to facilitate further research on this topic.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes