Uncertainty Quantification From Scaling Laws in Deep Neural Networks

arXiv:2503.05938v11 citationsh-index: 111Machine Learning: Science and Technology
Originality Incremental advance
AI Analysis

This work addresses uncertainty quantification for machine learning applications in the physical sciences, focusing on initialization uncertainty, and is incremental as it builds on existing neural tangent kernel theory.

The paper tackled the problem of quantifying uncertainty in deep neural networks by analyzing scaling laws for test loss mean and variance with training set size, finding that the coefficient of variation becomes independent of training set size at both infinite and finite widths for large datasets, implying it can be approximated from infinite-width values.

Quantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $μ_{\mathcal{L}}$ and variance $σ_{\mathcal{L}}^2$ of the test loss $\mathcal{L}$ for an ensemble of multi-layer perceptrons (MLPs) with neural tangent kernel (NTK) initialization in the infinite-width limit, and compare empirically to the results from finite-width networks for three example tasks: MNIST classification, CIFAR classification and calorimeter energy regression. We observe scaling laws as a function of training set size $N_\mathcal{D}$ for both $μ_{\mathcal{L}}$ and $σ_{\mathcal{L}}$, but find that the coefficient of variation $ε_{\mathcal{L}} \equiv σ_{\mathcal{L}}/μ_{\mathcal{L}}$ becomes independent of $N_\mathcal{D}$ at both infinite and finite width for sufficiently large $N_\mathcal{D}$. This implies that the coefficient of variation of a finite-width network may be approximated by its infinite-width value, and may in principle be calculable using finite-width perturbation theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes