A general framework for ensemble distribution distillation
This work addresses the need for leaner models that retain uncertainty decomposition in machine learning, though it is incremental as it builds on existing ensemble distillation methods.
The paper tackles the problem of standard ensemble distillation erasing the uncertainty decomposition into aleatoric and epistemic components, and presents a general framework that preserves this decomposition while maintaining predictive performance on par with standard distillation.
Ensembles of neural networks have been shown to give better performance than single networks, both in terms of predictions and uncertainty estimation. Additionally, ensembles allow the uncertainty to be decomposed into aleatoric (data) and epistemic (model) components, giving a more complete picture of the predictive uncertainty. Ensemble distillation is the process of compressing an ensemble into a single model, often resulting in a leaner model that still outperforms the individual ensemble members. Unfortunately, standard distillation erases the natural uncertainty decomposition of the ensemble. We present a general framework for distilling both regression and classification ensembles in a way that preserves the decomposition. We demonstrate the desired behaviour of our framework and show that its predictive performance is on par with standard distillation.