Parameter-free version of Adaptive Gradient Methods for Strongly-Convex Functions
This work addresses the need for more robust optimization algorithms in machine learning by eliminating parameter tuning, though it is incremental as it builds on existing universal algorithms like Metagrad.
The paper tackles the problem of adaptive gradient methods requiring prior knowledge of strong convexity parameters and learning rates, and presents a parameter-free version that achieves O(d log T) regret bounds.
The optimal learning rate for adaptive gradient methods applied to λ-strongly convex functions relies on the parameters λ and learning rate η. In this paper, we adapt a universal algorithm along the lines of Metagrad, to get rid of this dependence on λ and η. The main idea is to concurrently run multiple experts and combine their predictions to a master algorithm. This master enjoys O(d log T) regret bounds.