MLLGJun 6, 2022

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

PrincetonU of Toronto
arXiv:2206.02768v347 citationsh-index: 38
Originality Incremental advance
AI Analysis

This work addresses a foundational issue in neural network theory for researchers, providing a precise condition to prevent exploding or vanishing norms in large networks, though it is incremental as it builds on existing shaping methods.

The authors tackled the problem of understanding the distribution of the random covariance matrix in neural networks at initialization, particularly in the shaped infinite-depth-and-width limit, and derived a stochastic differential equation (SDE) that closely matches finite network simulations.

The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes