LGAIMLDec 13, 2024

Who's the (Multi-)Fairest of Them All: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration

CMU
arXiv:2412.10575v22 citationsh-index: 1AAAI
Originality Incremental advance
AI Analysis

This work addresses fairness in machine learning for classification tasks with multiple minority groups, revealing limitations in existing methods and offering a more rigorous evaluation approach.

The paper re-evaluates interpolation-based data augmentation methods like Fair Mixup for fairness using multicalibration, finding that Fair Mixup often worsens performance and fairness, while vanilla Mixup outperforms it, especially on small groups, and combining vanilla Mixup with multicalibration post-processing further improves fairness.

Data augmentation methods, especially SoTA interpolation-based methods such as Fair Mixup, have been widely shown to increase model fairness. However, this fairness is evaluated on metrics that do not capture model uncertainty and on datasets with only one, relatively large, minority group. As a remedy, multicalibration has been introduced to measure fairness while accommodating uncertainty and accounting for multiple minority groups. However, existing methods of improving multicalibration involve reducing initial training data to create a holdout set for post-processing, which is not ideal when minority training data is already sparse. This paper uses multicalibration to more rigorously examine data augmentation for classification fairness. We stress-test four versions of Fair Mixup on two structured data classification problems with up to 81 marginalized groups, evaluating multicalibration violations and balanced accuracy. We find that on nearly every experiment, Fair Mixup \textit{worsens} baseline performance and fairness, but the simple vanilla Mixup \textit{outperforms} both Fair Mixup and the baseline, especially when calibrating on small groups. \textit{Combining} vanilla Mixup with multicalibration post-processing, which enforces multicalibration through post-processing on a holdout set, further increases fairness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes