ML LG STJan 13, 2021

Learning with Gradient Descent and Weakly Convex Losses

arXiv:2101.04968v216.019 citations

Originality Incremental advance

AI Analysis

This work addresses theoretical guarantees for gradient descent in non-convex optimization, particularly for neural networks, but it is incremental as it builds on prior stability and generalization frameworks.

The paper tackles the problem of analyzing gradient descent's learning performance with weakly convex empirical risks, proving generalization error bounds that hold under a wider range of step sizes and achieving out-of-sample guarantees by decomposing test error into bounded components. For two-layer neural networks, it demonstrates that local weak convexity can be controlled via network scaling, leading to test error guarantees under complexity assumptions and insights into implicit bias, supported by experiments.

We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.

View on arXiv PDF

Similar