LGMLOct 22, 2021

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

arXiv:2110.11805v116 citations
Originality Incremental advance
AI Analysis

This work provides incremental insights into the descent phenomena in neural networks, specifically for researchers studying optimization and generalization in machine learning models.

The authors tackled the problem of understanding the temporal evolution of generalization and training errors in the random feature model under gradient flow, showing that the full time-evolution path can be calculated analytically in the asymptotic limit of large system size, which reveals how double and triple descents develop over time and informs early stopping decisions.

Recent evidence has shown the existence of a so-called double-descent and even triple-descent behavior for the generalization error of deep-learning models. This important phenomenon commonly appears in implemented neural network architectures, and also seems to emerge in epoch-wise curves during the training process. A recent line of research has highlighted that random matrix tools can be used to obtain precise analytical asymptotics of the generalization (and training) errors of the random feature model. In this contribution, we analyze the whole temporal behavior of the generalization and training errors under gradient flow for the random feature model. We show that in the asymptotic limit of large system size the full time-evolution path of both errors can be calculated analytically. This allows us to observe how the double and triple descents develop over time, if and when early stopping is an option, and also observe time-wise descent structures. Our techniques are based on Cauchy complex integral representations of the errors together with recent random matrix methods based on linear pencils.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes