Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
This work addresses statistical inconsistency in overparameterized models for machine learning practitioners, providing a clear separation between early-stopped and asymptotic methods, though it is incremental in connecting implicit and explicit regularization.
The paper tackles the problem of overparameterized logistic regression by showing that early-stopped gradient descent achieves vanishing excess logistic risk and requires only polynomially many samples for small excess zero-one risk, whereas asymptotic gradient descent diverges and needs exponentially many samples.
In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum $\ell_2$-margin solution -- a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk vanishes for early-stopped GD but diverges to infinity for GD iterates at convergence. This suggests that early-stopped GD is well-calibrated, whereas asymptotic GD is statistically inconsistent. Second, we show that to attain a small excess zero-one risk, polynomially many samples are sufficient for early-stopped GD, while exponentially many samples are necessary for any interpolating estimator, including asymptotic GD. This separation underscores the statistical benefits of early stopping in the overparameterized regime. Finally, we establish nonasymptotic bounds on the norm and angular differences between early-stopped GD and $\ell_2$-regularized empirical risk minimizer, thereby connecting the implicit regularization of GD with explicit $\ell_2$-regularization.