LGOct 18, 2022

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

arXiv:2210.09818v14 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in uncertainty estimation for deep learning, with incremental improvements for practitioners in safety-critical applications like OOD detection.

The paper tackled the lack of theoretical understanding of deep ensembles for out-of-distribution detection by analyzing them using the neural tangent kernel, identifying two noise sources affecting predictive variance, and proposed methods to reduce this noise, leading to improved OOD detection in trained models.

Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes