MLLGJan 3, 2020

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

arXiv:2001.00921v328 citations
AI Analysis

This work addresses a theoretical gap in understanding wide limits for neural network architectures with bottlenecks, which is incremental but relevant for researchers in Bayesian deep learning and kernel methods.

The paper tackles the problem of extending the Gaussian process (GP) limit theory to Bayesian neural networks (BNNs) with narrow bottleneck layers, showing that such networks converge to a composition of GPs called a bottleneck NNGP, where the bottleneck induces output dependence and preserves kernel discriminative power even at extreme depths.

There has recently been much work on the "wide limit" of neural networks, where Bayesian neural networks (BNNs) are shown to converge to a Gaussian process (GP) as all hidden layers are sent to infinite width. However, these results do not apply to architectures that require one or more of the hidden layers to remain narrow. In this paper, we consider the wide limit of BNNs where some hidden layers, called "bottlenecks", are held at finite width. The result is a composition of GPs that we term a "bottleneck neural network Gaussian process" (bottleneck NNGP). Although intuitive, the subtlety of the proof is in showing that the wide limit of a composition of networks is in fact the composition of the limiting GPs. We also analyze theoretically a single-bottleneck NNGP, finding that the bottleneck induces dependence between the outputs of a multi-output network that persists through extreme post-bottleneck depths, and prevents the kernel of the network from losing discriminative power at extreme post-bottleneck depths.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes