Training of deep residual networks with stochastic MG/OPT
This work addresses training efficiency and robustness for deep residual networks, offering incremental improvements in optimization methods.
The paper tackles the problem of training deep residual networks by introducing a stochastic variant of the nonlinear multigrid method MG/OPT, leveraging a dynamical systems viewpoint for hierarchy construction, and reports significant speed-ups and robustness improvements on MNIST, with multilevel training also showing potential as a pruning technique.
We train deep residual networks with a stochastic variant of the nonlinear multigrid method MG/OPT. To build the multilevel hierarchy, we use the dynamical systems viewpoint specific to residual networks. We report significant speed-ups and additional robustness for training MNIST on deep residual networks. Our numerical experiments also indicate that multilevel training can be used as a pruning technique, as many of the auxiliary networks have accuracies comparable to the original network.