OCLGNAMay 27, 2019

One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods

arXiv:1905.11266v227 citations
Originality Synthesis-oriented
AI Analysis

This work provides a unified framework for stochastic optimization methods, addressing scalability issues in machine learning with large datasets or models, but it is incremental as it builds on and generalizes prior techniques.

The authors introduced a general variance-reduced method for regularized empirical risk minimization that unifies several existing algorithms and generates new ones, achieving linear convergence under specific assumptions and recovering or improving known rates.

We propose a remarkably general variance-reduced method suitable for solving regularized empirical risk minimization problems with either a large number of training examples, or a large model dimension, or both. In special cases, our method reduces to several known and previously thought to be unrelated methods, such as {\tt SAGA}, {\tt LSVRG}, {\tt JacSketch}, {\tt SEGA} and {\tt ISEGA}, and their arbitrary sampling and proximal generalizations. However, we also highlight a large number of new specific algorithms with interesting properties. We provide a single theorem establishing linear convergence of the method under smoothness and quasi strong convexity assumptions. With this theorem we recover best-known and sometimes improved rates for known methods arising in special cases. As a by-product, we provide the first unified method and theory for stochastic gradient and stochastic coordinate descent type methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes