Stop Wasting My Gradients: Practical SVRG
This work addresses computational bottlenecks in optimization for machine learning practitioners, offering incremental improvements to existing SVRG methods.
The paper tackles inefficiencies in stochastic variance-reduced gradient (SVRG) methods by proposing strategies to reduce gradient computations, showing that convergence rates are preserved with decreasing errors in control variates and using growing-batch approaches. It results in variants that cut gradient calculations in early iterations and later exploit support vectors, with proven improvements in convergence rates for regularized SVRG.
We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the early iterations. We further (i) show how to exploit support vectors to reduce the number of gradient computations in the later iterations, (ii) prove that the commonly-used regularized SVRG iteration is justified and improves the convergence rate, (iii) consider alternate mini-batch selection strategies, and (iv) consider the generalization error of the method.