MLLGOct 23, 2018

A Continuous-Time View of Early Stopping for Least Squares

arXiv:1810.10082v4111 citations
Originality Incremental advance
AI Analysis

This provides theoretical insights into early stopping for practitioners in machine learning, though it is incremental as it builds on existing ridge regression analysis.

The paper tackles the problem of comparing the statistical risk of gradient flow (continuous-time gradient descent) to ridge regression in least squares, proving that gradient flow's risk is at least 1.69 times higher than ridge's under a specific calibration, with minimal assumptions on the data.

We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that of ridge regression. Under the calibration $t=1/λ$---where $t$ is the time parameter in gradient flow, and $λ$ the tuning parameter in ridge regression---we prove that the risk of gradient flow is no less than 1.69 times that of ridge, along the entire path (for all $t \geq 0$). This holds in finite samples with very weak assumptions on the data model (in particular, with no assumptions on the features $X$). We prove that the same relative risk bound holds for prediction risk, in an average sense over the underlying signal $β_0$. Finally, we examine limiting risk expressions (under standard Marchenko-Pastur asymptotics), and give supporting numerical experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes