ML LGSep 17, 2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

Xiaoxia Wu, Yuege Xie, Simon Du, Rachel Ward

arXiv:2109.08282v16.37 citationsh-index: 32

Originality Incremental advance

AI Analysis

This work addresses the need for robust and efficient optimization methods in machine learning, though it appears incremental as it builds on existing adaptive gradient techniques with new theoretical guarantees.

The authors tackled the problem of designing an adaptive learning rate schedule for gradient descent that is computationally efficient and provably convergent, achieving linear convergence in linear regression and polynomial-time convergence to global minima in over-parameterized neural networks under certain width conditions.

We propose a computationally-friendly adaptive learning rate schedule, "AdaLoss", which directly uses the information of the loss function to adjust the stepsize in gradient descent methods. We prove that this schedule enjoys linear convergence in linear regression. Moreover, we provide a linear convergence guarantee over the non-convex regime, in the context of two-layer over-parameterized neural networks. If the width of the first-hidden layer in the two-layer networks is sufficiently large (polynomially), then AdaLoss converges robustly \emph{to the global minimum} in polynomial time. We numerically verify the theoretical results and extend the scope of the numerical experiments by considering applications in LSTM models for text clarification and policy gradients for control problems.

View on arXiv PDF

Similar