LGMLMar 10, 2025

Improving Deep Ensembles by Estimating Confusion Matrices

arXiv:2503.07119v12 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the challenge of optimizing ensemble performance in deep learning for applications requiring reliable uncertainty estimates, though it is incremental as it builds on existing crowdsourcing techniques.

The paper tackled the problem of improving deep ensembles by proposing a new aggregation method called soft Dawid Skene, which estimates confusion matrices to weigh ensemble members based on inferred performance, resulting in superior accuracy, calibration, and out-of-distribution detection compared to traditional ensemble averaging.

Ensembling in deep learning improves accuracy and calibration over single networks. The traditional aggregation approach, ensemble averaging, treats all individual networks equally by averaging their outputs. Inspired by crowdsourcing we propose an aggregation method called soft Dawid Skene for deep ensembles that estimates confusion matrices of ensemble members and weighs them according to their inferred performance. Soft Dawid Skene aggregates soft labels in contrast to hard labels often used in crowdsourcing. We empirically show the superiority of soft Dawid Skene in accuracy, calibration and out of distribution detection in comparison to ensemble averaging in extensive experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes