LG OC MLSep 2, 2019

Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent

Nicholas J. A. Harvey, Christopher Liaw, Sikander Randhawa

arXiv:1909.00843v114.841 citations

Originality Incremental advance

AI Analysis

This provides a more robust and efficient solution for optimization in machine learning, though it is incremental as it builds on existing averaging strategies.

The paper tackles the problem of achieving optimal convergence rates for stochastic gradient descent on non-smooth, strongly-convex functions, proving that a simple non-uniform averaging strategy attains the optimal O(1/T) rate with high probability and outperforms standard techniques in experiments with smaller variance.

We consider stochastic gradient descent algorithms for minimizing a non-smooth, strongly-convex function. Several forms of this algorithm, including suffix averaging, are known to achieve the optimal $O(1/T)$ convergence rate in expectation. We consider a simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) and prove that it achieves the optimal $O(1/T)$ convergence rate with high probability. Our proof uses a recently developed generalization of Freedman's inequality. Finally, we compare several of these algorithms experimentally and show that this non-uniform averaging strategy outperforms many standard techniques, and with smaller variance.

View on arXiv PDF

Similar