Lucia Celli

h-index2
2papers

2 Papers

6.0MLMay 22
Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs

Lucia Celli

Finite-width fully connected neural networks with Gaussian-initialized weights deviate from their infinite-width Gaussian limit, exhibiting non-vanishing higher-order cumulants. We approximate these deviations, for a neural network evaluated in a finite number of inputs, using multidimensional Edgeworth expansions of arbitrary order $4m-1$, with $m\in\mathbb{N}$. Assuming that the corresponding Gaussian limit has an invertible covariance matrix and that the activation function is polynomially bounded, we establish a bound of order $n^{-m}$ on the total variation distance between the law of the true network output and its Edgeworth approximation, with matching lower bounds. As an application, we quantify the error in Bayesian posterior distributions when the prior is replaced by its Edgeworth expansion. Our results are more general and also apply to sequences of conditionally Gaussian vectors converging to a Gaussian vector with invertible covariance.

PRApr 11, 2025
Entropic bounds for conditionally Gaussian vectors and applications to neural networks

Lucia Celli, Giovanni Peccati

Using entropic inequalities from information theory, we provide new bounds on the total variation and 2-Wasserstein distances between a conditionally Gaussian law and a Gaussian law with invertible covariance matrix. We apply our results to quantify the speed of convergence to Gaussian of a randomly initialized fully connected neural network and its derivatives - evaluated in a finite number of inputs - when the initialization is Gaussian and the sizes of the inner layers diverge to infinity. Our results require mild assumptions on the activation function, and allow one to recover optimal rates of convergence in a variety of distances, thus improving and extending the findings of Basteri and Trevisan (2023), Favaro et al. (2023), Trevisan (2024) and Apollonio et al. (2024). One of our main tools are the quantitative cumulant estimates established in Hanin (2024). As an illustration, we apply our results to bound the total variation distance between the Bayesian posterior law of the neural network and its derivatives, and the posterior law of the corresponding Gaussian limit: this yields quantitative versions of a posterior CLT by Hron et al. (2022), and extends several estimates by Trevisan (2024) to the total variation metric.