Diverse Ensembles Improve Calibration
This work addresses calibration issues in deep learning for practitioners, but it is incremental as it builds on existing ensemble methods.
The paper tackled the problem of poorly calibrated predictions in deep neural networks, especially under distribution shift, by proposing a simple ensemble technique using diverse data augmentations and mixing strategies, which improved calibration and accuracy on CIFAR benchmarks and their corrupted versions.
Modern deep neural networks can produce badly calibrated predictions, especially when train and test distributions are mismatched. Training an ensemble of models and averaging their predictions can help alleviate these issues. We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member. We additionally use the idea of `mixing' un-augmented and augmented inputs to improve calibration when test and training distributions are the same. These simple techniques improve calibration and accuracy over strong baselines on the CIFAR10 and CIFAR100 benchmarks, and out-of-domain data from their corrupted versions.