LGAIMay 29

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

arXiv:2606.0521915.4
Predicted impact top 54% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers studying deep learning optimization, this paper clarifies how step size and depth interact to shape pathway competition, explaining why large-step GD favors shared representations over single-pathway dominance.

This work shows that discrete Gradient Descent with large step size restores symmetry in multi-pathway deep linear networks, overriding the symmetry breaking predicted by Gradient Flow. The authors prove that single-path solutions are sharp minima, while distributing signals across pathways reduces sharpness, leading to a re-balancing phase where signals redistribute across pathways.

Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent (GD) with a large step size tells a different story. We prove that single-path solutions are sharp minima, whereas distributing signals across pathways reduces sharpness by a factor that decreases with both the number of pathways and depth. Consequently, while early training reproduces the depth-driven symmetry breaking predicted by GF, oscillations at the Edge of Stability subsequently override this tendency and drive the network into a re-balancing phase, where signals redistribute across pathways. Together, these results clarify how depth shapes pathway competition and explain why large-step GD favors shared representations rather than persistent single-pathway dominance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes