LGApr 7, 2020

A Brief Prehistory of Double Descent

Marco Loog, Tom Viering, Alexander Mey, Jesse H. Krijthe, David M. J. Tax

arXiv:2004.04328v121.576 citations

Originality Synthesis-oriented

AI Analysis

This work clarifies historical precedence for a phenomenon in machine learning theory, making it incremental for researchers interested in the evolution of ideas.

The paper addresses the historical context of double descent risk curves in machine learning, noting that earlier findings predate recent discussions, though it does not present new experimental results or concrete numbers.

In their thought-provoking paper [1], Belkin et al. illustrate and discuss the shape of risk curves in the context of modern high-complexity learners. Given a fixed training sample size $n$, such curves show the risk of a learner as a function of some (approximate) measure of its complexity $N$. With $N$ the number of features, these curves are also referred to as feature curves. A salient observation in [1] is that these curves can display, what they call, double descent: with increasing $N$, the risk initially decreases, attains a minimum, and then increases until $N$ equals $n$, where the training data is fitted perfectly. Increasing $N$ even further, the risk decreases a second and final time, creating a peak at $N=n$. This twofold descent may come as a surprise, but as opposed to what [1] reports, it has not been overlooked historically. Our letter draws attention to some original, earlier findings, of interest to contemporary machine learning.

View on arXiv PDF

Similar