From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
This work provides a more general theoretical understanding of BNN-GP convergence and offers a scalable inference method, which is significant for researchers and practitioners working with Bayesian deep learning and Gaussian processes.
This paper explores the convergence of shallow Bayesian neural networks (BNNs) to Gaussian processes (GPs), proposing a new covariance function that is a convex mixture of components from four activation functions. They developed a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, demonstrating stable hyperparameter estimates and competitive predictive performance on simulations and real-world tabular datasets.
In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabular datasets demonstrate stable hyperparameter estimates and competitive predictive performance at realistic computational cost.