LG OCDec 28, 2023

SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Robert Gower, Martin Takáč

arXiv:2312.17369v16 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This work provides an incremental improvement for machine learning practitioners by automating hyperparameter tuning in optimization algorithms.

The paper tackles the problem of manual step-size tuning in adaptive optimization methods for deep neural networks by introducing SANIA, a Polyak-type optimization framework that eliminates this need and addresses poorly scaled problems, achieving competitive performance in classification tasks.

Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.

View on arXiv PDF Code

Similar