CVMay 18, 2018

Norm-Preservation: Why Residual Networks Can Become Extremely Deep?

arXiv:1805.07477v584 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of training very deep neural networks for the machine learning community, offering theoretical insights and a practical modification (Procrustes ResNets) to guide deeper architectures, though it is incremental as it builds on existing ResNet concepts.

The paper tackled the problem of understanding why residual networks (ResNets) can be trained effectively at extreme depths by analyzing skip connections, proving that they preserve gradient norms and enhance stability as depth increases, and demonstrated that further norm-preservation through regularization improves learning dynamics and classification performance, with empirical evidence supporting these claims.

Augmenting neural networks with skip connections, as introduced in the so-called ResNet architecture, surprised the community by enabling the training of networks of more than 1,000 layers with significant performance gains. This paper deciphers ResNet by analyzing the effect of skip connections, and puts forward new theoretical results on the advantages of identity skip connections in neural networks. We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient, and lead to stable back-propagation, which is desirable from optimization perspective. We also show that, perhaps surprisingly, as more residual blocks are stacked, the norm-preservation of the network is enhanced. Our theoretical arguments are supported by extensive empirical evidence. Can we push for extra norm-preservation? We answer this question by proposing an efficient method to regularize the singular values of the convolution operator and making the ResNet's transition layers extra norm-preserving. Our numerical investigations demonstrate that the learning dynamics and the classification performance of ResNet can be improved by making it even more norm preserving. Our results and the introduced modification for ResNet, referred to as Procrustes ResNets, can be used as a guide for training deeper networks and can also inspire new deeper architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes