Stochastic Gradient Methods with Preconditioned Updates
This work addresses a bottleneck in optimization for machine learning practitioners by enhancing gradient methods to handle ill-conditioned problems more effectively, though it is incremental as it builds on existing algorithms.
The paper tackles the problem of poor performance in non-convex finite sum minimization when dealing with badly scaled or ill-conditioned problems by introducing scaled algorithms like Scaled SARAH and Scaled L-SVRG, which incorporate a preconditioner based on Hutchinson's Hessian diagonal approximation, resulting in improved practical performance as shown in numerical experiments.
This work considers the non-convex finite sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson's approach to approximating the diagonal of the Hessian, and couple it with several gradient-based methods to give new scaled algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented. We prove linear convergence when both smoothness and the PL condition are assumed. Our adaptively scaled methods use approximate partial second-order curvature information and, therefore, can better mitigate the impact of badly scaled problems. This improved practical performance is demonstrated in the numerical experiments also presented in this work.