LGSTMLJul 10, 2020

Characteristics of Monte Carlo Dropout in Wide Neural Networks

arXiv:2007.05434v19 citations
Originality Incremental advance
AI Analysis

This work addresses uncertainty estimation in neural networks, providing theoretical insights but is incremental as it builds on prior Gaussian process approximations.

The paper investigates the theoretical properties of Monte Carlo dropout in wide neural networks, proving that untrained networks converge to Gaussian processes, but finds empirical evidence of non-Gaussian behavior in finite-width networks due to correlated pre-activations.

Monte Carlo (MC) dropout is one of the state-of-the-art approaches for uncertainty estimation in neural networks (NNs). It has been interpreted as approximately performing Bayesian inference. Based on previous work on the approximation of Gaussian processes by wide and deep neural networks with random weights, we study the limiting distribution of wide untrained NNs under dropout more rigorously and prove that they as well converge to Gaussian processes for fixed sets of weights and biases. We sketch an argument that this property might also hold for infinitely wide feed-forward networks that are trained with (full-batch) gradient descent. The theory is contrasted by an empirical analysis in which we find correlations and non-Gaussian behaviour for the pre-activations of finite width NNs. We therefore investigate how (strongly) correlated pre-activations can induce non-Gaussian behavior in NNs with strongly correlated weights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes