MLLGFeb 14, 2021

Double-descent curves in neural networks: a new perspective using Gaussian processes

arXiv:2102.07238v56 citations
Originality Incremental advance
AI Analysis

This provides a theoretical interpretation of double-descent for researchers in machine learning, but it is incremental as it builds on existing NNGP and random matrix theory frameworks.

The paper tackles the double-descent phenomenon in neural networks by connecting it to Gaussian processes and random matrix theory, showing that the generalization error is governed by the discrepancy between width-dependent empirical kernels and width-independent NNGP kernels.

Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expression allows us to study the generalisation behavior of the corresponding kernel and GP regression, and provides a new interpretation of the double-descent phenomenon, namely as governed by the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes