LGMLJan 14, 2022

The Implicit Regularization of Momentum Gradient Descent with Early Stopping

arXiv:2201.05405v11 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights into optimization algorithms for machine learning practitioners, but it is incremental as it builds on prior studies of implicit regularization.

The paper characterizes the implicit regularization of momentum gradient descent with early stopping, showing it is closer to ridge regression than gradient descent for least squares regression, with a proven risk bound of at most 1.54 times that of ridge under specific calibration.

The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit $\ell_2$-regularization (ridge). In details, we study MGD in the continuous-time view, so-called momentum gradient flow (MGF), and show that its tendency is closer to ridge than the gradient descent (GD) [Ali et al., 2019] for least squares regression. Moreover, we prove that, under the calibration $t=\sqrt{2/λ}$, where $t$ is the time parameter in MGF and $λ$ is the tuning parameter in ridge regression, the risk of MGF is no more than 1.54 times that of ridge. In particular, the relative Bayes risk of MGF to ridge is between 1 and 1.035 under the optimal tuning. The numerical experiments support our theoretical results strongly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes