Iterate averaging as regularization for stochastic gradient descent
This work addresses regularization in stochastic gradient methods for machine learning practitioners, but it is incremental as it builds on existing averaging schemes.
The authors tackled the problem of regularization in stochastic gradient descent by proposing a weighted averaging scheme with geometrically decaying weights, showing it has the same regularizing effect as ridge regression for linear least squares regression and achieving finite-sample bounds that match the best known results.
We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.