MLLGApr 30, 2018

Gaussian Process Behaviour in Wide Deep Neural Networks

arXiv:1804.11271v2616 citations
Originality Synthesis-oriented
AI Analysis

This provides incremental theoretical insight into deep learning behavior for researchers, but does not introduce new practical methods.

The paper tackles the theoretical understanding of deep neural networks by showing that wide, deep feedforward networks converge to Gaussian processes as width increases, formalizing and extending prior results, and empirically finds close agreement in predictive quantities in some cases.

Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks. To evaluate convergence rates empirically, we use maximum mean discrepancy. We then compare finite Bayesian deep networks from the literature to Gaussian processes in terms of the key predictive quantities of interest, finding that in some cases the agreement can be very close. We discuss the desirability of Gaussian process behaviour and review non-Gaussian alternative models from the literature.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes