LG DS MLAug 5, 2020

Continuous-in-Depth Neural Networks

Alejandro F. Queiruga, N. Benjamin Erichson, Dane Taylor, Michael W. Mahoney

arXiv:2008.02389v119.756 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a theoretical limitation in neural network design for researchers and practitioners, offering a novel architecture that enhances training efficiency and inference speed, though it is incremental in building on existing ResNet interpretations.

The paper tackles the problem of residual networks lacking meaningful dynamical integrator properties by introducing ContinuousNet, a continuous-in-depth generalization that enables flexible computational graphs and an incremental training scheme, resulting in improved model quality and significantly decreased training time.

Recent work has attempted to interpret residual networks (ResNets) as one step of a forward Euler discretization of an ordinary differential equation, focusing mainly on syntactic algebraic similarities between the two systems. Discrete dynamical integrators of continuous dynamical systems, however, have a much richer structure. We first show that ResNets fail to be meaningful dynamical integrators in this richer sense. We then demonstrate that neural network models can learn to represent continuous dynamical systems, with this richer structure and properties, by embedding them into higher-order numerical integration schemes, such as the Runge Kutta schemes. Based on these insights, we introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures. ContinuousNets exhibit an invariance to the particular computational graph manifestation. That is, the continuous-in-depth model can be evaluated with different discrete time step sizes, which changes the number of layers, and different numerical integration schemes, which changes the graph connectivity. We show that this can be used to develop an incremental-in-depth training scheme that improves model quality, while significantly decreasing training time. We also show that, once trained, the number of units in the computational graph can even be decreased, for faster inference with little-to-no accuracy drop.

View on arXiv PDF Code

Similar