SVRG Meets AdaGrad: Painless Variance Reduction
This work addresses the practical challenge of hyperparameter tuning in variance reduction methods for machine learning practitioners, making these methods more accessible and robust.
This paper introduces AdaSVRG, a robust variant of the SVRG variance reduction method that integrates AdaGrad to eliminate the need for problem-dependent constants. It achieves an O(ε)-suboptimality with Õ(n + 1/ε) gradient evaluations, matching typical rates without requiring prior knowledge of constants.
Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step-size. When minimizing a sum of n smooth convex functions, we prove that a variant of AdaSVRG requires $\tilde{O}(n + 1/ε)$ gradient evaluations to achieve an $O(ε)$-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. Next, we leverage the properties of AdaGrad to propose a heuristic that adaptively determines the length of each inner-loop in AdaSVRG. Via experiments on synthetic and real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over standard and other "tune-free" VR methods.