Gradient Methods with Online Scaling
This work addresses optimization efficiency for machine learning practitioners by providing provable acceleration with online scaling, though it is incremental relative to existing preconditioning methods.
The paper tackles accelerating gradient-based methods by learning to scale gradients online, achieving an O(κ* log(1/ε)) complexity for smooth strongly convex optimization, improving on previous O(√n κ* log(1/ε)) results, and proving superlinear convergence on convex quadratics.
We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong convergence guarantee with respect to the optimal scaling matrix for the iteration trajectory. For smooth strongly convex optimization, our results provide an $O(κ^\star \log(1/\varepsilon)$) complexity result, where $κ^\star$ is the condition number achievable by the optimal preconditioner, improving on the previous $O(\sqrt{n}κ^\star \log(1/\varepsilon))$ result. In particular, a variant of our method achieves superlinear convergence on convex quadratics. For smooth convex optimization, we show for the first time that the widely-used hypergradient descent heuristic improves on the convergence of gradient descent.