OCLGApr 18, 2020

On Tight Convergence Rates of Without-replacement SGD

arXiv:2004.08657v14 citations
Originality Incremental advance
AI Analysis

This work addresses limitations in prior analyses for a widely used optimization method, potentially improving theoretical understanding for machine learning practitioners.

The paper tackles the problem of analyzing convergence rates for SGD without replacement sampling in finite-sum optimization, showing that it can achieve rates better than the baseline SGD rate of O(1/(nK)) without extra poly-logarithmic factors or requiring many epochs dependent on the condition number.

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by $n$ the number of components in the cost and $K$ the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on $n$ and $K$ than the baseline rate of $O(1/(nK))$ for SGD. However, there are two main limitations shared among those works: the rates have extra poly-logarithmic factors on $nK$, and denoting by $κ$ the condition number of the problem, the rates hold after $κ^c\log(nK)$ epochs for some $c>0$. In this work, we overcome these limitations by analyzing step sizes that vary across epochs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes