Wasserstein-based fairness interpretability framework for machine learning models
This work addresses fairness interpretability for stakeholders in ML by providing a method to explain bias, though it appears incremental as it builds on existing metrics and theories without claiming broad SOTA improvements.
The authors tackled the problem of measuring and explaining bias in machine learning models by introducing a fairness interpretability framework that uses the Wasserstein metric to quantify bias across sub-population distributions, with results including decomposition into positive and negative contributions via transport theory and cooperative game theory techniques.
The objective of this article is to introduce a fairness interpretability framework for measuring and explaining the bias in classification and regression models at the level of a distribution. In our work, we measure the model bias across sub-population distributions in the model output using the Wasserstein metric. To properly quantify the contributions of predictors, we take into account the favorability of both the model and predictors with respect to the non-protected class. The quantification is accomplished by the use of transport theory, which gives rise to the decomposition of the model bias and bias explanations to positive and negative contributions. To gain more insight into the role of favorability and allow for additivity of bias explanations, we adapt techniques from cooperative game theory.