Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free Optimization
This work extends learning-rate-free methods to adaptive gradient optimization, potentially simplifying training for deep learning practitioners, though it appears incremental as it builds on existing steepest descent approaches.
The paper tackles the challenge of estimating learning rates for adaptive gradient methods in deep neural network training by interpreting these methods as steepest descent on parameter-scaled networks, enabling learning-rate-free adaptive gradient methods. Experimental results show comparable performance to hand-tuned learning rates across various scenarios.
We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent. However, although steepest descent methods offer an intuitive approach to finding minima, many deep learning applications require adaptive gradient methods to achieve faster convergence. In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks, proposing learning-rate-free adaptive gradient methods. Experimental results verify the effectiveness of this approach, demonstrating comparable performance to hand-tuned learning rates across various scenarios. This work extends the applicability of learning-rate-free methods, enhancing training with adaptive gradient methods.