Deep Convolutional Networks as shallow Gaussian Processes
This work provides a theoretical link between CNNs and GPs, which could benefit researchers in machine learning by offering interpretable models with few hyperparameters, though it is incremental as it extends prior results from dense networks to convolutional architectures.
The authors tackled the problem of connecting deep convolutional neural networks to Gaussian processes by showing that a CNN with appropriate priors becomes a Gaussian process in the limit of infinitely many filters, and they demonstrated this with a kernel equivalent to a 32-layer ResNet achieving 0.84% classification error on MNIST, setting a new record for GPs with comparable parameters.
We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.