Shapley meets Rawls: an integrated framework for measuring and explaining unfairness
This work addresses fairness and explainability issues in AI for practitioners and researchers, offering an incremental improvement by combining existing concepts into a unified framework.
The paper tackles the problem of measuring and explaining unfairness in machine learning by integrating Shapley values with group fairness criteria, resulting in a framework that identifies contributing features like 'Age', 'Number of hours', and 'Marital status' for gender unfairness with shorter computation times than traditional methods.
Explainability and fairness have mainly been considered separately, with recent exceptions trying the explain the sources of unfairness. This paper shows that the Shapley value can be used to both define and explain unfairness, under standard group fairness criteria. This offers an integrated framework to estimate and derive inference on unfairness as-well-as the features that contribute to it. Our framework can also be extended from Shapley values to the family of Efficient-Symmetric-Linear (ESL) values, some of which offer more robust definitions of fairness, and shorter computation times. An illustration is run on the Census Income dataset from the UCI Machine Learning Repository. Our approach shows that ``Age", ``Number of hours" and ``Marital status" generate gender unfairness, using shorter computation time than traditional Bootstrap tests.