LG AI CYNov 21, 2022

Bursting the Burden Bubble? An Assessment of Sharma et al.'s Counterfactual-based Fairness Metric

Yochem van Rosmalen, Florian van der Steen, Sebastiaan Jans, Daan van der Weijden

arXiv:2211.11512v13.31 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of evaluating fairness in machine learning models for unprivileged communities, but it is incremental as it compares an existing metric to a new one without introducing a novel method.

The study compared the Burden fairness metric, which uses counterfactuals to measure average distance to the decision boundary for negatively classified individuals, with statistical parity across synthetic and real-world datasets, finding that Burden can detect unfairness where statistical parity cannot and that the metrics sometimes disagree on which group is treated unfairly.

Machine learning has seen an increase in negative publicity in recent years, due to biased, unfair, and uninterpretable models. There is a rising interest in making machine learning models more fair for unprivileged communities, such as women or people of color. Metrics are needed to evaluate the fairness of a model. A novel metric for evaluating fairness between groups is Burden, which uses counterfactuals to approximate the average distance of negatively classified individuals in a group to the decision boundary of the model. The goal of this study is to compare Burden to statistical parity, a well-known fairness metric, and discover Burden's advantages and disadvantages. We do this by calculating the Burden and statistical parity of a sensitive attribute in three datasets: two synthetic datasets are created to display differences between the two metrics, and one real-world dataset is used. We show that Burden can show unfairness where statistical parity can not, and that the two metrics can even disagree on which group is treated unfairly. We conclude that Burden is a valuable metric, but does not replace statistical parity: it rather is valuable to use both.

View on arXiv PDF Code

Similar