The Disparate Benefits of Deep Ensembles
This addresses fairness issues in machine learning for applications like facial analysis and medical imaging, highlighting an incremental but important risk in widely used ensemble methods.
The paper investigates how Deep Ensembles, which boost predictive performance, unevenly affect algorithmic fairness across socially relevant groups, revealing a disparate benefits effect that impacts metrics like statistical parity and equal opportunity. It finds that per-group differences in predictive diversity explain this effect and shows that the Hardt post-processing method can mitigate it effectively.
Ensembles of Deep Neural Networks, Deep Ensembles, are widely used as a simple way to boost predictive performance. However, their impact on algorithmic fairness is not well understood yet. Algorithmic fairness examines how a model's performance varies across socially relevant groups defined by protected attributes such as age, gender, or race. In this work, we explore the interplay between the performance gains from Deep Ensembles and fairness. Our analysis reveals that they unevenly favor different groups, a phenomenon that we term the disparate benefits effect. We empirically investigate this effect using popular facial analysis and medical imaging datasets with protected group attributes and find that it affects multiple established group fairness metrics, including statistical parity and equal opportunity. Furthermore, we identify that the per-group differences in predictive diversity of ensemble members can explain this effect. Finally, we demonstrate that the classical Hardt post-processing method is particularly effective at mitigating the disparate benefits effect of Deep Ensembles by leveraging their better-calibrated predictive distributions.