OC LGApr 18, 2020

On Tight Convergence Rates of Without-replacement SGD

arXiv:2004.08657v18.94 citations

Originality Incremental advance

AI Analysis

This work addresses limitations in prior analyses for a widely used optimization method, potentially improving theoretical understanding for machine learning practitioners.

The paper tackles the problem of analyzing convergence rates for SGD without replacement sampling in finite-sum optimization, showing that it can achieve rates better than the baseline SGD rate of O(1/(nK)) without extra poly-logarithmic factors or requiring many epochs dependent on the condition number.

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by $n$ the number of components in the cost and $K$ the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on $n$ and $K$ than the baseline rate of $O(1/(nK))$ for SGD. However, there are two main limitations shared among those works: the rates have extra poly-logarithmic factors on $nK$, and denoting by $κ$ the condition number of the problem, the rates hold after $κ^c\log(nK)$ epochs for some $c>0$. In this work, we overcome these limitations by analyzing step sizes that vary across epochs.

View on arXiv PDF

Similar