LG AI OCMar 5, 2024

Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horvath, Martin Takac, Eduard Gorbunov

arXiv:2403.02648v413.49 citationsh-index: 21Has CodeNIPS

Originality Incremental advance

AI Analysis

This incremental improvement addresses the need for more efficient and robust optimization algorithms for machine learning practitioners, particularly in complex tasks like image and text classification.

The paper tackles the problem of inefficient learning rate tuning in adaptive optimization methods by introducing KATE, a scale-invariant version of AdaGrad, which achieves a convergence rate of O(log T / sqrt T) for smooth non-convex problems and outperforms AdaGrad while matching or surpassing Adam in image and text classification tasks.

Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios.

View on arXiv PDF Code

Similar