LGOCJan 28, 2022

Training invariances and the low-rank phenomenon: beyond linear networks

arXiv:2201.11968v241 citations
AI Analysis

This work provides incremental theoretical insights into the implicit bias of neural network training, relevant for researchers studying optimization and generalization in deep learning.

The paper extends the theoretical result that deep linear networks converge to rank-1 matrices under gradient flow to the last few linear layers of nonlinear ReLU networks with skip connections, showing this holds for submatrices where neurons are stably-activated but not generally for full matrices.

The implicit bias induced by the training of neural networks has become a topic of rigorous study. In the limit of gradient flow and gradient descent with appropriate step size, it has been shown that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-1 matrices. In this paper, we extend this theoretical result to the last few linear layers of the much wider class of nonlinear ReLU-activated feedforward networks containing fully-connected layers and skip connections. Similar to the linear case, the proof relies on specific local training invariances, sometimes referred to as alignment, which we show to hold for submatrices where neurons are stably-activated in all training examples, and it reflects empirical results in the literature. We also show this is not true in general for the full matrix of ReLU fully-connected layers. Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes