A Variational View on Bootstrap Ensembles as Bayesian Inference
This work addresses the challenge of interpreting ensemble methods in a Bayesian framework for researchers in machine learning, though it appears incremental as it builds on existing variational and ensemble techniques.
The paper tackles the problem of connecting ensemble methods for neural networks to Bayesian inference by using variational arguments, showing that under certain conditions, ensemble optimization reduces divergence to the posterior, with experiments confirming ensembles as a viable alternative to approximate Bayesian inference.
In this paper, we employ variational arguments to establish a connection between ensemble methods for Neural Networks and Bayesian inference. We consider an ensemble-based scheme where each model/particle corresponds to a perturbation of the data by means of parametric bootstrap and a perturbation of the prior. We derive conditions under which any optimization steps of the particles makes the associated distribution reduce its divergence to the posterior over model parameters. Such conditions do not require any particular form for the approximation and they are purely geometrical, giving insights on the behavior of the ensemble on a number of interesting models such as Neural Networks with ReLU activations. Experiments confirm that ensemble methods can be a valid alternative to approximate Bayesian inference; the theoretical developments in the paper seek to explain this behavior.