Learning with incremental iterative regularization
This work addresses a foundational issue in machine learning optimization, providing theoretical insights into incremental gradient methods, though it is incremental in nature.
The paper tackles the problem of understanding the regularization effect of multiple epochs in stochastic gradient methods for least squares, showing that the number of epochs acts as a regularization parameter and proving strong universal consistency with sharp finite sample bounds.
Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method. In particular, we show that, if all other parameters are fixed a priori, the number of passes over the data (epochs) acts as a regularization parameter, and prove strong universal consistency, i.e. almost sure convergence of the risk, as well as sharp finite sample bounds for the iterates. Our results are a step towards understanding the effect of multiple epochs in stochastic gradient techniques in machine learning and rely on integrating statistical and optimization results.