LGOCJul 17, 2022

SP2: A Second Order Stochastic Polyak Method

arXiv:2207.08171v118 citationsh-index: 35
Originality Incremental advance
AI Analysis

This work addresses the optimization efficiency challenge for machine learning practitioners, offering an incremental improvement over existing stochastic Polyak methods.

The authors tackled the problem of adaptive step size selection in stochastic gradient descent by developing SP2, a second-order stochastic Polyak method that uses Hessian-vector products to accelerate convergence, showing competitive performance on tasks like matrix completion and logistic regression with specific speed-ups reported.

Recently the "SP" (Stochastic Polyak step size) method has emerged as a competitive adaptive method for setting the step sizes of SGD. SP can be interpreted as a method specialized to interpolated models, since it solves the interpolation equations. SP solves these equation by using local linearizations of the model. We take a step further and develop a method for solving the interpolation equations that uses the local second-order approximation of the model. Our resulting method SP2 uses Hessian-vector products to speed-up the convergence of SP. Furthermore, and rather uniquely among second-order methods, the design of SP2 in no way relies on positive definite Hessian matrices or convexity of the objective function. We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. We also provide a convergence theory on sums-of-quadratics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes