Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
This provides theoretical insights into the universality of Gaussian approximations for deep learning models, which is incremental but foundational for understanding neural network behavior in random initialization settings.
The paper tackles the problem of approximating the finite-dimensional distributions of deep neural networks with random weights using Gaussian limits, establishing convergence rates in the Wasserstein-1 norm under specific conditions, such as order n^{-(1/6)^{L-1} + ε} for proportional layer widths.
We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + ε}$, for any $ε> 0$.