PRLGFeb 17, 2017

A Random Matrix Approach to Neural Networks

arXiv:1702.05419v2188 citations
AI Analysis

This provides theoretical insights for tuning hyperparameters in random neural networks, but it is incremental as it builds on existing random matrix theory.

The paper tackles the analysis of Gram random matrices in random neural networks by proving that the resolvent behaves similarly to sample covariance matrix models, providing a deterministic equivalent for the empirical spectral measure and enabling asymptotic performance estimation.

This article studies the Gram random matrix model $G=\frac1TΣ^{\rm T}Σ$, $Σ=σ(WX)$, classically found in the analysis of random feature maps and random neural networks, where $X=[x_1,\ldots,x_T]\in{\mathbb R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in{\mathbb R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries, and $σ:{\mathbb R}\to{\mathbb R}$ is a Lipschitz continuous (activation) function --- $σ(WX)$ being understood entry-wise. By means of a key concentration of measure lemma arising from non-asymptotic random matrix arguments, we prove that, as $n,p,T$ grow large at the same rate, the resolvent $Q=(G+γI_T)^{-1}$, for $γ>0$, has a similar behavior as that met in sample covariance matrix models, involving notably the moment $Φ=\frac{T}n{\mathbb E}[G]$, which provides in passing a deterministic equivalent for the empirical spectral measure of $G$. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes