MLDIS-NNLGFeb 28, 2025

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

arXiv:2502.21269v327 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work provides theoretical insights into generalization and overfitting in overparametrized models, which is incremental but addresses a fundamental problem in machine learning theory.

The authors studied the training dynamics of large two-layer neural networks using dynamical mean field theory, revealing a separation of timescales that leads to inductive bias towards small complexity and a non-monotonic test error behavior due to feature unlearning at large times.

Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of non-equilibrium statistical physics. We show that, for large network width $m$, and large number of samples per input dimension $n/d$, the training dynamics exhibits a separation of timescales which implies: $(i)$~The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity of the network; $(ii)$~Inductive bias towards small complexity if the initialization has small enough complexity; $(iii)$~A dynamical decoupling between feature learning and overfitting regimes; $(iv)$~A non-monotone behavior of the test error, associated `feature unlearning' regime at large times.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes