LGMLAug 27, 2019

Finite size corrections for neural network Gaussian processes

arXiv:1908.10030v132 citations
AI Analysis

This work addresses a theoretical gap for researchers modeling neural networks as Gaussian processes, though it is incremental as it focuses on initialization without training effects.

The authors tackled the discrepancy between finite-width neural networks and their infinite-width Gaussian process limit by deriving that the output distribution at initialization is a Gaussian perturbed by a fourth Hermite polynomial term, with the perturbation scale inversely proportional to network width and higher-order terms decaying faster, recovering the Edgeworth expansion.

There has been a recent surge of interest in modeling neural networks (NNs) as Gaussian processes. In the limit of a NN of infinite width the NN becomes equivalent to a Gaussian process. Here we demonstrate that for an ensemble of large, finite, fully connected networks with a single hidden layer the distribution of outputs at initialization is well described by a Gaussian perturbed by the fourth Hermite polynomial for weights drawn from a symmetric distribution. We show that the scale of the perturbation is inversely proportional to the number of units in the NN and that higher order terms decay more rapidly, thereby recovering the Edgeworth expansion. We conclude by observing that understanding how this perturbation changes under training would reveal the regimes in which the Gaussian process framework is valid to model NN behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes