Asymptotic Analysis of Deep Residual Networks
This provides theoretical insights into deep learning dynamics, but it is incremental as it builds on existing neural ODE literature.
The paper investigates the asymptotic properties of deep Residual networks as layers increase, identifying scaling regimes for weights that differ from neural ODE assumptions and showing convergence to ODEs, SDEs, or neither, with a diffusive regime described by SDEs.
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics.