NANAOct 18, 2018

On the Regularizing Property of Stochastic Gradient Descent

arXiv:1805.10470
AI Analysis

Provides theoretical justification for a widely observed empirical phenomenon in machine learning, relevant to practitioners using SGD for large-scale inverse problems.

The paper rigorously proves that stochastic gradient descent with early stopping has a regularizing property for linear inverse problems, establishing convergence rates under a sourcewise condition.

Stochastic gradient descent is one of the most successful approaches for solving large-scale problems, especially in machine learning and statistics. At each iteration, it employs an unbiased estimator of the full gradient computed from one single randomly selected data point. Hence, it scales well with problem size and is very attractive for truly massive dataset, and holds significant potentials for solving large-scale inverse problems. In the recent literature of machine learning, it was empirically observed that when equipped with early stopping, it has regularizing property. In this work, we rigorously establish its regularizing property (under \textit{a priori} early stopping rule), and also prove convergence rates under the canonical sourcewise condition, for minimizing the quadratic functional for linear inverse problems. This is achieved by combining tools from classical regularization theory and stochastic analysis. Further, we analyze the preasymptotic weak and strong convergence behavior of the algorithm. The theoretical findings shed insights into the performance of the algorithm, and are complemented with illustrative numerical experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes