On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
This work addresses the problem of scaling variance reduction techniques for stochastic gradient descent to asynchronous settings, which is important for large-scale machine learning applications, though it is incremental as it builds on existing algorithms.
The paper tackles the lack of asynchronous versions for variance reduction algorithms like SVRG and SAGA, which are crucial for large-scale applications, by proposing a unifying framework and an asynchronous algorithm that achieves near linear speedup in sparse settings and demonstrates empirical performance through asynchronous SVRG.
We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms---a crucial requirement for modern large-scale applications---have not been studied. We bridge this gap by presenting a unifying framework for many variance reduction techniques. Subsequently, we propose an asynchronous algorithm grounded in our framework, and prove its fast convergence. An important consequence of our general approach is that it yields asynchronous versions of variance reduction algorithms such as SVRG and SAGA as a byproduct. Our method achieves near linear speedup in sparse settings common to machine learning. We demonstrate the empirical performance of our method through a concrete realization of asynchronous SVRG.