LGAug 9, 2021

Training of deep residual networks with stochastic MG/OPT

arXiv:2108.04052v14 citations
AI Analysis

This work addresses training efficiency and robustness for deep residual networks, offering incremental improvements in optimization methods.

The paper tackles the problem of training deep residual networks by introducing a stochastic variant of the nonlinear multigrid method MG/OPT, leveraging a dynamical systems viewpoint for hierarchy construction, and reports significant speed-ups and robustness improvements on MNIST, with multilevel training also showing potential as a pruning technique.

We train deep residual networks with a stochastic variant of the nonlinear multigrid method MG/OPT. To build the multilevel hierarchy, we use the dynamical systems viewpoint specific to residual networks. We report significant speed-ups and additional robustness for training MNIST on deep residual networks. Our numerical experiments also indicate that multilevel training can be used as a pruning technique, as many of the auxiliary networks have accuracies comparable to the original network.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes