Towards Simple and Provable Parameter-Free Adaptive Gradient Methods
This addresses inefficiencies in deep learning training by providing provably convergent, parameter-free alternatives to widely used optimizers, though it is incremental as it builds on existing methods.
The paper tackles the problem of adhoc learning rate tuning in optimization algorithms like AdaGrad and Adam by introducing AdaGrad++ and Adam++, simple parameter-free variants with formal convergence guarantees, achieving comparable convergence rates to their counterparts without predefined learning rate assumptions and validating competitive performance in experiments.
Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, adhoc tuning of learning rates poses a challenge, leading to inefficiencies in practice. To address this issue, recent research has focused on developing "learning-rate-free" or "parameter-free" algorithms that operate effectively without the need for learning rate tuning. Despite these efforts, existing parameter-free variants of AdaGrad and Adam tend to be overly complex and/or lack formal convergence guarantees. In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates. Experimental results across various deep learning tasks validate the competitive performance of AdaGrad++ and Adam++.