LGJul 10, 2025

Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks

arXiv:2507.07675v2
Originality Incremental advance
AI Analysis

Provides theoretical insights into rank dynamics in neural networks, relevant for researchers studying expressivity and training dynamics.

The paper analyzes layerwise effective dimension in finite-width ReLU networks, showing that expected rank follows a geometric decay with ratio 1-2/π≈0.3634 and exhibits oscillatory peaks at specific depths, while proving this is a finite-width phenomenon that disappears under orthogonal initialization or leaky-ReLU.

We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the \$m\times n\$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/π)^\ell]+O(e^{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / π\approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the "revival" depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k^*\approx(k+1/2)π/\log(1/ρ)$ with height $\approx (1-e^{-π/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes