MLLGSTJul 7, 2020

Doubly infinite residual neural networks: a diffusion process approach

arXiv:2007.03253v25 citations
AI Analysis

This work addresses theoretical issues in training deep and wide neural networks, but it is incremental as it builds directly on prior results.

The paper tackles the problem of undesirable propagation properties in infinitely deep neural networks by extending prior work to convolutional ResNets and establishing backward-propagation results, and it investigates doubly infinite ResNets, showing that their dynamics converge to deterministic limits and providing analytical expressions for inference, highlighting limited expressive power under i.i.d. initializations.

Modern neural networks (NN) featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide NNs and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep NNs. NNs with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward propagation properties as the number of layers increases. To overcome these drawbacks, Peluchetti and Favaro (2020) considered fully-connected residual networks (ResNets) with network's parameters initialized by means of distributions that shrink as the number of layers increases, thus establishing an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, and showing that infinitely deep ResNets does not suffer from undesirable forward-propagation properties. In this paper, we review the results of Peluchetti and Favaro (2020), extending them to convolutional ResNets, and we establish analogous backward-propagation results, which directly relate to the problem of training fully-connected deep ResNets. Then, we investigate the more general setting of doubly infinite NNs, where both network's width and network's depth grow unboundedly. We focus on doubly infinite fully-connected ResNets, for which we consider i.i.d. initializations. Under this setting, we show that the dynamics of quantities of interest converge, at initialization, to deterministic limits. This allow us to provide analytical expressions for inference, both in the case of weakly trained and fully trained ResNets. Our results highlight a limited expressive power of doubly infinite ResNets when the unscaled network's parameters are i.i.d. and the residual blocks are shallow.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes