LGOct 24, 2025

Generalization Bounds for Rank-sparse Neural Networks

arXiv:2510.21945v23 citationsh-index: 6
Originality Incremental advance
AI Analysis

This provides theoretical insights into generalization for researchers in machine learning theory, but it is incremental as it builds on known bottleneck rank observations.

The paper tackles the problem of understanding generalization in neural networks by exploiting their observed low-rank structure, proving generalization bounds that depend on the rank of weight matrices and showing sample complexity of ̃O(WrL^2) for small Schatten p quasi-norms.

It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the ``bottleneck rank'', which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten $p$ quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten $p$ quasi norms of the weight matrices: for small $p$, the bounds exhibit a sample complexity $ \widetilde{O}(WrL^2)$ where $W$ and $L$ are the width and depth of the neural network respectively and where $r$ is the rank of the weight matrices. As $p$ increases, the bound behaves more like a norm-based bound instead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes