LGAIFeb 5

Emergent Low-Rank Training Dynamics in MLPs with Smooth Activations

arXiv:2602.06208v1h-index: 6
Originality Incremental advance
AI Analysis

This provides theoretical justification for low-rank training methods in nonlinear networks, which is incremental but addresses a gap in understanding for deep learning practitioners.

The paper analyzes training dynamics in multi-layer perceptrons (MLPs) and shows that weight updates concentrate in low-dimensional subspaces, with theoretical characterization for two-layer networks and empirical validation that low-rank parameterizations can match full performance on classification tasks.

Recent empirical evidence has demonstrated that the training dynamics of large-scale deep neural networks occur within low-dimensional subspaces. While this has inspired new research into low-rank training, compression, and adaptation, theoretical justification for these dynamics in nonlinear networks remains limited. %compared to deep linear settings. To address this gap, this paper analyzes the learning dynamics of multi-layer perceptrons (MLPs) under gradient descent (GD). We demonstrate that the weight dynamics concentrate within invariant low-dimensional subspaces throughout training. Theoretically, we precisely characterize these invariant subspaces for two-layer networks with smooth nonlinear activations, providing insight into their emergence. Experimentally, we validate that this phenomenon extends beyond our theoretical assumptions. Leveraging these insights, we empirically show there exists a low-rank MLP parameterization that, when initialized within the appropriate subspaces, matches the classification performance of fully-parameterized counterparts on a variety of classification tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes