Is BatchEnsemble a Single Model? On Calibration and Diversity of Efficient Ensembles
This work addresses the need for efficient uncertainty estimation in resource-constrained settings, revealing that BatchEnsemble is incremental and may not deliver the expected benefits for practitioners relying on ensembles.
The paper tackled the problem of whether BatchEnsemble provides effective ensemble-like uncertainty estimates, finding that it underperforms Deep Ensembles and behaves similarly to a single model in accuracy, calibration, and out-of-distribution detection on datasets like CIFAR10/10C/SVHN, with members showing near-identical functions and parameters in a controlled MNIST study.
In resource-constrained and low-latency settings, uncertainty estimates must be efficiently obtained. Deep Ensembles provide robust epistemic uncertainty (EU) but require training multiple full-size models. BatchEnsemble aims to deliver ensemble-like EU at far lower parameter and memory cost by applying learned rank-1 perturbations to a shared base network. We show that BatchEnsemble not only underperforms Deep Ensembles but closely tracks a single model baseline in terms of accuracy, calibration and out-of-distribution (OOD) detection on CIFAR10/10C/SVHN. A controlled study on MNIST finds members are near-identical in function and parameter space, indicating limited capacity to realize distinct predictive modes. Thus, BatchEnsemble behaves more like a single model than a true ensemble.