MLITLGOCMar 22, 2016

Trading-off variance and complexity in stochastic gradient descent

arXiv:1603.06861v113 citations
Originality Incremental advance
AI Analysis

This work addresses the efficiency and convergence issues in large-scale machine learning optimization, offering an incremental improvement over existing variance-reduction techniques like SVRG.

The authors tackled the high variance and computational cost of stochastic gradient descent by proposing CheapSVRG, a method that uses a surrogate gradient computed on a small data subset to achieve a linear convergence rate with a trade-off between complexity and convergence, performing competitively compared to state-of-the-art methods.

Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing a new update rule which requires infrequent passes over the entire input dataset to compute the full-gradient. In this work, we propose CheapSVRG, a stochastic variance-reduction optimization scheme. Our algorithm is similar to SVRG but instead of the full gradient, it uses a surrogate which can be efficiently computed on a small subset of the input data. It achieves a linear convergence rate ---up to some error level, depending on the nature of the optimization problem---and features a trade-off between the computational complexity and the convergence rate. Empirical evaluation shows that CheapSVRG performs at least competitively compared to the state of the art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes