LGCVITJan 14, 2021

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

arXiv:2101.05544v160 citations
Originality Highly original
AI Analysis

This addresses the problem of balancing diversity and performance in deep ensembles for machine learning practitioners, offering an incremental improvement over existing regularization methods.

The paper tackles the trade-off between ensemble diversity and individual accuracy in deep ensembles by introducing DICE, a training criterion that reduces spurious correlations among features via adversarial conditional redundancy estimation, achieving state-of-the-art accuracy on CIFAR-10/100, such as matching a 7-network ensemble with only 5 networks.

Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. In this paper, we argue that learning strategies for deep ensembles need to tackle the trade-off between ensemble diversity and individual accuracies. Motivated by arguments from information theory and leveraging recent advances in neural estimation of conditional mutual information, we introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features. The main idea is that features extracted from pairs of members should only share information useful for target class prediction without being conditionally redundant. Therefore, besides the classification loss with information bottleneck, we adversarially prevent features from being conditionally predictable from each other. We manage to reduce simultaneous errors while protecting class information. We obtain state-of-the-art accuracy results on CIFAR-10/100: for example, an ensemble of 5 networks trained with DICE matches an ensemble of 7 networks trained independently. We further analyze the consequences on calibration, uncertainty estimation, out-of-distribution detection and online co-distillation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes