ML LGApr 17, 2025

Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time

arXiv:2504.13110v312.34 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This addresses the theoretical understanding of neural network training dynamics for researchers, providing incremental insights into feature learning with polynomial convergence times.

The paper tackles the approximation gap between finite-width and infinite-width neural networks trained with projected gradient descent in the mean-field regime, showing that polynomially many neurons suffice to closely approximate the dynamics due to a self-concordance property in single-index model problems.

We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessian of each particle, defined as the derivative of the particle's velocity in the mean-field dynamics with respect to its position. We apply our results to the canonical feature learning problem of estimating a well-specified single-index model; we permit the information exponent to be arbitrarily large, leading to convergence times that grow polynomially in the ambient dimension $d$. We show that, due to a certain ``self-concordance'' property in these problems -- where the local Hessian of a particle is bounded by a constant times the particle's velocity -- polynomially many neurons are sufficient to closely approximate the mean-field dynamics throughout training.

View on arXiv PDF Code

Similar