LGNAMLFeb 6, 2024

Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks

arXiv:2402.03991v210 citationsh-index: 5
Originality Incremental advance
AI Analysis

This provides a theoretical explanation for the common occurrence of low-rank bias in deep learning, which is incremental but clarifies a known bottleneck.

The paper proves that deep neural collapse leads to low-rank weight matrices in nonlinear feedforward and residual networks, and shows that interpolating minima are globally optimal with no loss barrier, enabling prediction of singular values before training.

Recent work in deep learning has shown strong empirical and theoretical evidence of an implicit low-rank bias: weight matrices in deep networks tend to be approximately low-rank. Moreover, removing relatively small singular values during training, or from available trained models, may significantly reduce model size while maintaining or even improving model performance. However, the majority of the theoretical investigations around low-rank bias in neural networks deal with oversimplified models, often not taking into account the impact of nonlinearity. In this work, we first of all quantify a link between the phenomenon of deep neural collapse and the emergence of low-rank weight matrices for a general class of feedforward networks with nonlinear activation. In addition, for the general class of nonlinear feedforward and residual networks, we prove the global optimality of deep neural collapsed configurations and the practical absence of a loss barrier between interpolating minima and globally optimal points, offering a possible explanation for its common occurrence. As a byproduct, our theory also allows us to forecast the final global structure of singular values before training. Our theoretical findings are supported by a range of experimental evaluations illustrating the phenomenon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes