LGJan 30, 2023
Optimal Approximation Complexity of High-Dimensional Functions with Neural NetworksVincent P. H. Goverse, Jad Hamdan, Jared Tanner
We investigate properties of neural networks that use both ReLU and $x^2$ as activation functions and build upon previous results to show that both analytic functions and functions in Sobolev spaces can be approximated by such networks of constant depth to arbitrary accuracy, demonstrating optimal order approximation rates across all nonlinear approximators, including standard ReLU networks. We then show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
PRJul 11, 2024
Genus expansion for non-linear random matrix ensembles with applications to neural networksNicola Muca Cirone, Jad Hamdan, Cristopher Salvi
We present a unified approach to studying certain non-linear random matrix ensembles and associated random neural networks at initialization. This begins with a novel series expansion for neural networks which generalizes Faá di Bruno's formula to an arbitrary number of compositions. The role of monomials is played by random multilinear maps indexed by directed graphs, whose edges correspond to random matrices. Crucially, this expansion linearizes the effect of the activation functions, allowing for the direct application of Wick's principle and the genus expansion technique. As an application, we prove several results about neural networks with random weights. We first give a new proof of the fact that they converge to Gaussian processes as their width tends to infinity. Secondly, we quantify the rate of convergence of the Neural Tangent Kernel to its deterministic limit in Frobenius norm. Finally, we compute the moments of the limiting spectral distribution of the Jacobian (only the first two of which were previously known), expressing them as sums over non-crossing partitions. All of these results are then generalised to the case of neural networks with sparse and non-Gaussian weights, under moment assumptions.