MLAILGSTFeb 24, 2025

Convergence of Shallow ReLU Networks on Weakly Interacting Data

arXiv:2502.16977v13 citationsh-index: 14
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for training neural networks on high-dimensional data, which is incremental as it builds on existing convergence analysis with specific assumptions.

The paper tackles the problem of analyzing the convergence of shallow ReLU networks trained by gradient flow on weakly interacting data, showing that a network with width of order log(n) neurons suffices for global convergence with high probability, achieving an exponential rate of convergence of 1/n and refined bounds between 1/n and 1/sqrt(n) for orthogonal data.

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes