Improving Deep Ensembles by Estimating Confusion Matrices
This work addresses the challenge of optimizing ensemble performance in deep learning for applications requiring reliable uncertainty estimates, though it is incremental as it builds on existing crowdsourcing techniques.
The paper tackled the problem of improving deep ensembles by proposing a new aggregation method called soft Dawid Skene, which estimates confusion matrices to weigh ensemble members based on inferred performance, resulting in superior accuracy, calibration, and out-of-distribution detection compared to traditional ensemble averaging.
Ensembling in deep learning improves accuracy and calibration over single networks. The traditional aggregation approach, ensemble averaging, treats all individual networks equally by averaging their outputs. Inspired by crowdsourcing we propose an aggregation method called soft Dawid Skene for deep ensembles that estimates confusion matrices of ensemble members and weighs them according to their inferred performance. Soft Dawid Skene aggregates soft labels in contrast to hard labels often used in crowdsourcing. We empirically show the superiority of soft Dawid Skene in accuracy, calibration and out of distribution detection in comparison to ensemble averaging in extensive experiments.