MLLGApr 6, 2015

Early Stopping is Nonparametric Variational Inference

arXiv:1504.01344v199 citations
Originality Incremental advance
AI Analysis

This work offers a theoretical basis for common optimization tricks in machine learning, potentially improving overfitting resistance, but it is incremental as it builds on existing SGD and variational inference concepts.

The paper demonstrates that unconverged stochastic gradient descent (SGD) can be interpreted as sampling from a nonparametric variational posterior, enabling a scalable, unbiased estimate of the variational lower bound for log marginal likelihood to optimize hyperparameters without cross-validation. This Bayesian interpretation provides a theoretical foundation for techniques like early stopping and ensembling, with empirical investigation on neural network models.

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes