LGMLJun 14, 2015

Bayesian Dark Knowledge

arXiv:1506.04416v3141 citations
AI Analysis

This addresses the need for efficient Bayesian neural networks in low-data or uncertainty-sensitive applications like bandits or active learning, though it is incremental as it builds on existing distillation and Monte Carlo techniques.

The paper tackles the problem of Bayesian parameter estimation for deep neural networks by distilling a Monte Carlo approximation of the posterior predictive density into a single network, resulting in better performance, simpler implementation, and reduced computation at test time compared to recent methods.

We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities, e.g., for applications involving bandits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time). We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network. We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [Hernandez-Lobato and Adams, 2015] and an approach based on variational Bayes [Blundell et al., 2015]. Our method performs better than both of these, is much simpler to implement, and uses less computation at test time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes