Convex SGD: Generalization Without Early Stopping
This provides a theoretical foundation for SGD generalization in convex optimization, benefiting researchers and practitioners in machine learning by addressing a key bottleneck in understanding algorithm performance.
The paper tackles the generalization error of stochastic gradient descent (SGD) on smooth convex functions, showing a bound that vanishes as iterations and dataset size increase, specifically scaling as ̃O(1/√T + 1/√n) with step-size 1/√t, and proving that strong convexity is not required for good generalization.
We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations $T$ and the dataset size $n$ go to zero at arbitrary rates; our bound scales as $\tilde{O}(1/\sqrt{T} + 1/\sqrt{n})$ with step-size $α_t = 1/\sqrt{t}$. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.