LGMLMay 16, 2020

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

arXiv:2005.08110v121 citations
AI Analysis

This work addresses the challenge of efficiently deploying Bayesian neural networks for practitioners by compressing uncertainty estimates, though it is incremental as it builds on existing distillation frameworks.

The paper tackles the problem of distilling Bayesian posterior expectations from deep neural networks, extending prior work to compress posterior predictive distributions and expected entropy into student models, achieving competitive performance in downstream tasks like uncertainty ranking and out-of-distribution detection.

In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and student model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We focus on the posterior predictive distribution and expected entropy as distillation targets. We investigate several aspects of this framework including the impact of uncertainty and the choice of student model architecture. We study methods for student model architecture search from a speed-storage-accuracy perspective and evaluate down-stream tasks leveraging entropy distillation including uncertainty ranking and out-of-distribution detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes