Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
This work addresses fairness and transparency issues in AI for protected groups, but it is incremental as it builds on existing explainability methods without introducing a new paradigm.
The paper tackles the problem of detecting unfairness in AI systems by exploring how local post-hoc explainability methods can be used as bias detectors, and it identifies critical challenges such as consistency across methods and aggregation strategies, though it does not provide concrete numerical results.
As Artificial Intelligence (AI) is increasingly used in areas that significantly impact human lives, concerns about fairness and transparency have grown, especially regarding their impact on protected groups. Recently, the intersection of explainability and fairness has emerged as an important area to promote responsible AI systems. This paper explores how explainability methods can be leveraged to detect and interpret unfairness. We propose a pipeline that integrates local post-hoc explanation methods to derive fairness-related insights. During the pipeline design, we identify and address critical questions arising from the use of explanations as bias detectors such as the relationship between distributive and procedural fairness, the effect of removing the protected attribute, the consistency and quality of results across different explanation methods, the impact of various aggregation strategies of local explanations on group fairness evaluations, and the overall trustworthiness of explanations as bias detectors. Our results show the potential of explanation methods used for fairness while highlighting the need to carefully consider the aforementioned critical aspects.