On the expected behaviour of noise regularised deep neural networks as Gaussian processes
This work addresses the theoretical understanding of noise regularization in NNGPs for machine learning researchers, but it is incremental as it builds on existing equivalence between neural networks and Gaussian processes.
The paper investigated how noise regularization, such as dropout, affects neural network Gaussian processes (NNGPs) by linking their behavior to signal propagation theory in deep neural networks. It found that optimal NNGP kernel parameters align with a proposed initialization scheme for ReLU networks, and noise enhances the prior towards simpler functions, with experimental validation on MNIST, CIFAR-10, and synthetic data.
Recent work has established the equivalence between deep neural networks and Gaussian processes (GPs), resulting in so-called neural network Gaussian processes (NNGPs). The behaviour of these models depends on the initialisation of the corresponding network. In this work, we consider the impact of noise regularisation (e.g. dropout) on NNGPs, and relate their behaviour to signal propagation theory in noise regularised deep neural networks. For ReLU activations, we find that the best performing NNGPs have kernel parameters that correspond to a recently proposed initialisation scheme for noise regularised ReLU networks. In addition, we show how the noise influences the covariance matrix of the NNGP, producing a stronger prior towards simple functions away from the training points. We verify our theoretical findings with experiments on MNIST and CIFAR-10 as well as on synthetic data.