ML LGMar 14, 2022

Phenomenology of Double Descent in Finite-Width Neural Networks

Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

ETH Zurich

arXiv:2203.07337v115.613 citationsh-index: 169

Originality Incremental advance

AI Analysis

This work addresses a gap in theoretical understanding for neural networks, which is incremental as it builds on prior analyses focused on linear models.

The authors tackled the problem of understanding double descent in finite-width neural networks, deriving bounds on population loss that exhibit double descent at the interpolation threshold and investigating how loss functions affect this phenomenon.

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components -- such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a double descent behaviour at the interpolation threshold. Building on our analysis, we further investigate how the loss function affects double descent -- and thus uncover interesting properties of neural networks and their Hessian spectra near the interpolation threshold.

View on arXiv PDF

Similar