LGAIDec 20, 2020

Biased Models Have Biased Explanations

arXiv:2012.10986v126 citations
AI Analysis

This work addresses the problem of biased explanations for machine learning models, which is important for researchers and practitioners aiming to build more transparent and fair AI systems. It is an incremental contribution to the field of FairML.

This paper investigates the relationship between model bias and explanation bias, hypothesizing that biased models produce biased explanations. They define group fairness in terms of explanations and propose a method to detect unfairness in black-box models. They also introduce a novel post-processing mitigation technique that enhances individual fairness in recourse while preserving group-level fairness.

We study fairness in Machine Learning (FairML) through the lens of attribute-based explanations generated for machine learning models. Our hypothesis is: Biased Models have Biased Explanations. To establish that, we first translate existing statistical notions of group fairness and define these notions in terms of explanations given by the model. Then, we propose a novel way of detecting (un)fairness for any black box model. We further look at post-processing techniques for fairness and reason how explanations can be used to make a bias mitigation technique more individually fair. We also introduce a novel post-processing mitigation technique which increases individual fairness in recourse while maintaining group level fairness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes