MLLGOct 27, 2017

The Implicit Bias of Gradient Descent on Separable Data

arXiv:1710.10345v71118 citations
AI Analysis

This provides theoretical insight into implicit regularization in machine learning, explaining why optimization continues beyond zero training error, though it is incremental to prior work on optimization biases.

The paper proves that gradient descent on unregularized logistic regression with separable data converges to the max-margin SVM solution, generalizing to other losses and multi-class settings, and shows this convergence is slow (logarithmic in loss).

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes