LGAIMLMay 27, 2025

Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

arXiv:2505.21722v13 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses theoretical challenges in training deep neural networks for researchers, but it is incremental as it provides initial steps toward a broader theory.

The paper tackles the problem of understanding gradient descent dynamics in deep ReLU networks initialized with small weights, focusing on the escape from the initial saddle at the origin. It shows that the optimal escape direction exhibits a low-rank bias, with the first singular value in deeper layers being at least ℓ^(1/4) larger than others, and argues this is a step toward proving Saddle-to-Saddle dynamics.

When a deep ReLU network is initialized with small weights, GD is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a low-rank bias in its deeper layers: the first singular value of the $\ell$-th layer weight matrix is at least $\ell^{\frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We argue that this result is a first step in proving Saddle-to-Saddle dynamics in deep ReLU networks, where GD visits a sequence of saddles with increasing bottleneck rank.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes