Kernel Interpolation as a Bayes Point Machine
This work provides theoretical insights into generalization for neural networks, though it is incremental as it builds on existing ensemble and geometry theories.
The paper shows that kernel interpolation acts as a Bayes point machine for Gaussian process classification, enabling the derivation of PAC-Bayes risk bounds, and suggests this may explain generalization in large margin neural networks, with evidence from finite width networks.
A Bayes point machine is a single classifier that approximates the majority decision of an ensemble of classifiers. This paper observes that kernel interpolation is a Bayes point machine for Gaussian process classification. This observation facilitates the transfer of results from both ensemble theory as well as an area of convex geometry known as Brunn-Minkowski theory to derive PAC-Bayes risk bounds for kernel interpolation. Since large margin, infinite width neural networks are kernel interpolators, the paper's findings may help to explain generalisation in neural networks more broadly. Supporting this idea, the paper finds evidence that large margin, finite width neural networks behave like Bayes point machines too.