LGNov 28, 2020

Quasi-Newton's method in the class gradient defined high-curvature subspace

Mark Tuddenham, Adam Prügel-Bennett, Jonathan Hare

arXiv:2012.01938v15.04 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of accelerating optimization for deep learning classification models, which is relevant for researchers and practitioners aiming to improve training efficiency. The findings suggest that a seemingly intuitive approach is not effective, providing a cautionary tale for future method development.

This paper investigates the application of Newton's method within a high-curvature subspace of the loss landscape, specifically the subspace spanned by logit gradients for each class in classification problems. The authors found that a direct implementation of this strategy, combining Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space, unexpectedly slows down convergence.

Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space. We show that a naive implementation actually slows down convergence and we speculate why this might be.

View on arXiv PDF

Similar