Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
This work addresses the problem of understanding feature learning and posterior behavior in wide Bayesian neural networks beyond the NNGP limit for researchers in theoretical machine learning.
This paper investigates wide Bayesian neural networks, focusing on rare but statistically dominant fluctuations that drive posterior concentration, moving beyond Gaussian-process limits. It introduces large-deviation theory to derive explicit variational objectives (rate functions) for predictors, which capture non-Gaussian tails, posterior deformation, and data-dependent kernel selection, accurately describing finite-width behavior for moderately sized networks.
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.