MLLGPRJun 18, 2021

$α$-Stable convergence of heavy-tailed infinitely-wide neural networks

arXiv:2106.11064v18 citations
Originality Synthesis-oriented
AI Analysis

This provides theoretical insights for researchers studying neural network initialization and heavy-tailed phenomena, but it is incremental as it extends prior work.

The paper tackles the problem of understanding the convergence behavior of infinitely-wide neural networks with heavy-tailed weight initializations, showing that pre-activation values converge to symmetric α-stable distributions under suitable scaling.

We consider infinitely-wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with i.i.d. samples from either a light-tailed (finite variance) or heavy-tailed distribution in the domain of attraction of a symmetric $α$-stable distribution, where $α\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $α$-stable distribution having the same $α$ parameter of that layer. We then extend a recent result of Favaro, Fortini, and Peluchetti (2020), to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $α$-stable distributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes