LGOct 3, 2022

Block-wise Training of Residual Networks via the Minimizing Movement Scheme

arXiv:2210.00949v21 citationsh-index: 52
Originality Incremental advance
AI Analysis

This work addresses training inefficiencies in neural networks, particularly for constrained settings like on-device training, though it is incremental as it builds on existing layer-wise optimization approaches.

The paper tackles the shortcomings of end-to-end backpropagation, such as high memory requirements and locking issues, by developing a block-wise training method for ResNets that improves test accuracy on classification tasks.

End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes