LGOCMLSep 2, 2019

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

arXiv:1909.00843v141 citations
Originality Incremental advance
AI Analysis

This provides a more robust and efficient solution for optimization in machine learning, though it is incremental as it builds on existing averaging strategies.

The paper tackles the problem of achieving optimal convergence rates for stochastic gradient descent on non-smooth, strongly-convex functions, proving that a simple non-uniform averaging strategy attains the optimal O(1/T) rate with high probability and outperforms standard techniques in experiments with smaller variance.

We consider stochastic gradient descent algorithms for minimizing a non-smooth, strongly-convex function. Several forms of this algorithm, including suffix averaging, are known to achieve the optimal $O(1/T)$ convergence rate in expectation. We consider a simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) and prove that it achieves the optimal $O(1/T)$ convergence rate with high probability. Our proof uses a recently developed generalization of Freedman's inequality. Finally, we compare several of these algorithms experimentally and show that this non-uniform averaging strategy outperforms many standard techniques, and with smaller variance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes