Error Parity Fairness: Testing for Group Fairness in Regression Tasks
This work addresses fairness testing for regression tasks, which is a gap in the fair machine learning literature, potentially aiding accountability assessments and algorithm audits.
The authors tackled the lack of fairness testing methods for regression tasks by proposing error parity as a fairness notion and a statistical hypothesis testing methodology to assess group fairness, demonstrating its application in a COVID-19 case study that revealed race-based differences in forecast errors.
The applications of Artificial Intelligence (AI) surround decisions on increasingly many aspects of human lives. Society responds by imposing legal and social expectations for the accountability of such automated decision systems (ADSs). Fairness, a fundamental constituent of AI accountability, is concerned with just treatment of individuals and sensitive groups (e.g., based on sex, race). While many studies focus on fair learning and fairness testing for the classification tasks, the literature is rather limited on how to examine fairness in regression tasks. This work presents error parity as a regression fairness notion and introduces a testing methodology to assess group fairness based on a statistical hypothesis testing procedure. The error parity test checks whether prediction errors are distributed similarly across sensitive groups to determine if an ADS is fair. It is followed by a suitable permutation test to compare groups on several statistics to explore disparities and identify impacted groups. The usefulness and applicability of the proposed methodology are demonstrated via a case study on COVID-19 projections in the US at the county level, which revealed race-based differences in forecast errors. Overall, the proposed regression fairness testing methodology fills a gap in the fair machine learning literature and may serve as a part of larger accountability assessments and algorithm audits.