LGOCMLApr 13, 2020

Multilevel Minimization for Deep Residual Networks

arXiv:2004.06196v119 citations
AI Analysis

This work addresses training efficiency for deep learning practitioners, but it is incremental as it builds on existing ResNet and optimal control methods.

The authors tackled the problem of reducing training time for deep residual networks by introducing a multilevel minimization framework based on a dynamical system viewpoint, achieving a speedup of more than three times while maintaining the same validation accuracy.

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system's viewpoint, which formulates a ResNet as the discretization of an initial value problem. The training process is then formulated as a time-dependent optimal control problem, which we discretize using different time-discretization parameters, eventually generating multilevel-hierarchy of auxiliary networks with different resolutions. The training of the original ResNet is then enhanced by training the auxiliary networks with reduced resolutions. By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy. By means of numerical examples, we analyze the convergence behavior of the proposed method and demonstrate its robustness. For our examples we employ a multilevel gradient-based methods. Comparisons with standard single level methods show a speedup of more than factor three while achieving the same validation accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes