LGAIJun 3, 2021

Optimization Variance: Exploring Generalization Properties of DNNs

arXiv:2106.01714v15 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding and managing generalization in deep learning for researchers and practitioners, offering a method to reduce reliance on validation data, though it is incremental as it builds on existing bias-variance analysis.

The paper investigates epoch-wise double descent in deep neural networks, where test error shows a double descent pattern as training epochs increase, and finds that variance alone correlates with test error. They propose optimization variance (OV), a metric that can be estimated from training data and correlates with test error, enabling early stopping without a validation set.

Unlike the conventional wisdom in statistical learning theory, the test error of a deep neural network (DNN) often demonstrates double descent: as the model complexity increases, it first follows a classical U-shaped curve and then shows a second descent. Through bias-variance decomposition, recent studies revealed that the bell-shaped variance is the major cause of model-wise double descent (when the DNN is widened gradually). This paper investigates epoch-wise double descent, i.e., the test error of a DNN also shows double descent as the number of training epoches increases. By extending the bias-variance analysis to epoch-wise double descent of the zero-one loss, we surprisingly find that the variance itself, without the bias, varies consistently with the test error. Inspired by this result, we propose a novel metric, optimization variance (OV), to measure the diversity of model updates caused by the stochastic gradients of random training batches drawn in the same iteration. OV can be estimated using samples from the training set only but correlates well with the (unknown) \emph{test} error, and hence early stopping may be achieved without using a validation set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes