LG NA MLMay 30, 2021

Overparameterization of deep ResNet: zero loss and mean-field analysis

Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

arXiv:2105.14417v313.128 citations

Originality Incremental advance

AI Analysis

This work addresses the theoretical understanding of why deep ResNets can be trained to fit data perfectly, which is significant for researchers in machine learning theory, though it is incremental as it builds on existing mean-field analysis for neural networks.

The authors tackled the problem of training deep ResNets to achieve zero loss by analyzing gradient descent in the overparameterized limit of infinite depth and width, proving that the training dynamics converge to a zero-loss solution under certain assumptions and providing estimates for the required network size to achieve a given loss threshold with high probability.

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of weights in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, we show that under certain assumptions, the solution to the PDE converges in the training time to a zero-loss solution. Together, these results suggest that the training of the ResNet gives a near-zero loss if the ResNet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

View on arXiv PDF

Similar