LGMLJun 10, 2021

Early-stopped neural networks are consistent

arXiv:2106.05932v251 citations
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for early stopping in neural network training, addressing consistency and calibration issues in general data distributions, which is incremental but important for machine learning practitioners.

The paper tackles the problem of training shallow ReLU neural networks for binary classification with non-zero Bayes risk, showing that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of logistic, misclassification losses, and calibration, with complexities scaling with a measure of the true conditional model.

This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any univariate classifier satisfying a local interpolation property is inconsistent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes