MLLGSep 24, 2018

Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

arXiv:1809.08848v343 citations
Originality Highly original
AI Analysis

This resolves initialization as a confounding factor between activation function choice and learning rate in ResNets, enabling consistent training performance across different activations.

The paper demonstrates that residual neural networks (ResNets) can achieve dynamical isometry at initialization regardless of the activation function used, by deriving a universal formula for the spectral density of the input-output Jacobian and validating it with numerical simulations on CIFAR-10.

We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study the consequence of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results, by ensuring the same level of dynamical isometry at initialization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes