LGOCMLMay 28, 2016

Optimal Rates for Multi-pass Stochastic Gradient Methods

arXiv:1605.08882v337 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights for machine learning practitioners using stochastic gradient descent, though it is incremental as it builds on existing analysis of gradient methods.

The paper tackles the problem of analyzing stochastic gradient methods with multiple passes and mini-batches, showing that the number of passes acts as a regularization parameter enabling optimal finite sample bounds through early-stopping, and that larger step-sizes are permissible with mini-batches.

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. We study how regularization properties are controlled by the step-size, the number of passes and the mini-batch size. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases. As a byproduct, we derive optimal convergence results for batch gradient methods (even in the non-attainable cases).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes