LGAIOCMar 5, 2024

Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

arXiv:2403.02648v49 citationsh-index: 21NIPS
Originality Incremental advance
AI Analysis

This incremental improvement addresses the need for more efficient and robust optimization algorithms for machine learning practitioners, particularly in complex tasks like image and text classification.

The paper tackles the problem of inefficient learning rate tuning in adaptive optimization methods by introducing KATE, a scale-invariant version of AdaGrad, which achieves a convergence rate of O(log T / sqrt T) for smooth non-convex problems and outperforms AdaGrad while matching or surpassing Adam in image and text classification tasks.

Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes