Global law of conjugate kernel random matrices with heavy-tailed weights
This work addresses theoretical understanding of neural network behavior under heavy-tailed weights, which is incremental as it extends existing spectral analysis to new distributional assumptions.
The paper studied the asymptotic spectral behavior of conjugate kernel random matrices from two-layer neural networks with heavy-tailed weight distributions, showing that heavy-tailed weights induce strong correlations and fundamentally different spectral properties compared to light-tailed models.
We study the asymptotic spectral behavior of the conjugate kernel random matrix $YY^\top$, where $Y= f(WX)$ arises from a two-layer neural network model. We consider the setting where $W$ and $X$ are both random rectangular matrices with i.i.d. entries, where the entries of $W$ follow a heavy-tailed distribution, while those of $X$ have light tails. Our assumptions on $W$ include a broad class of heavy-tailed distributions, such as symmetric $α$-stable laws with $α\in (0,2)$ and sparse matrices with $\mathcal{O}(1)$ nonzero entries per row. The activation function $f$, applied entrywise, is nonlinear, smooth, and odd. By computing the eigenvalue distribution of $YY^\top$ through its moments, we show that heavy-tailed weights induce strong correlations between the entries of $Y$, leading to richer and fundamentally different spectral behavior compared to models with light-tailed weights.