MLLGJul 11, 2020

Bayesian Deep Ensembles via the Neural Tangent Kernel

arXiv:2007.05864v2133 citations
Originality Highly original
AI Analysis

This provides a theoretical foundation for deep ensembles as Bayesian methods, improving uncertainty estimation for machine learning practitioners, though it is incremental in building on existing NTK and ensemble work.

The paper tackled the lack of a Gaussian process posterior interpretation for deep ensembles by modifying training with a randomized function, enabling a posterior interpretation in the infinite width limit and proving more conservative predictions. It demonstrated that Bayesian deep ensembles outperform standard ensembles in out-of-distribution settings for regression and classification tasks.

We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes