Adapting Fairness Interventions to Missing Values
This addresses fairness issues in machine learning for real-world applications with missing data, offering a scalable solution to mitigate discrimination exacerbated by standard data handling methods.
The paper tackles the problem of algorithmic fairness being compromised by missing values in data, showing that standard impute-then-classify methods worsen fairness and accuracy. It presents adaptive algorithms that, when combined with existing fairness interventions, achieve higher fairness and accuracy across datasets.
Missing values in real-world data pose a significant and unique challenge to algorithmic fairness. Different demographic groups may be unequally affected by missing data, and the standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification -- a procedure referred to as "impute-then-classify" -- can exacerbate discrimination. In this paper, we analyze how missing values affect algorithmic fairness. We first prove that training a classifier from imputed data can significantly worsen the achievable values of group fairness and average accuracy. This is because imputing data results in the loss of the missing pattern of the data, which often conveys information about the predictive label. We present scalable and adaptive algorithms for fair classification with missing values. These algorithms can be combined with any preexisting fairness-intervention algorithm to handle all possible missing patterns while preserving information encoded within the missing patterns. Numerical experiments with state-of-the-art fairness interventions demonstrate that our adaptive algorithms consistently achieve higher fairness and accuracy than impute-then-classify across different datasets.