Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
It addresses the gap in specifying how explainability can combat bias for researchers and practitioners in NLP, but is incremental as it reviews trends rather than proposing new solutions.
The paper reviews how explainability methods are used to address bias in NLP models, identifying current practices and barriers to wider application.
Motivations for methods in explainable artificial intelligence (XAI) often include detecting, quantifying and mitigating bias, and contributing to making machine learning models fairer. However, exactly how an XAI method can help in combating biases is often left unspecified. In this paper, we briefly review trends in explainability and fairness in NLP research, identify the current practices in which explainability methods are applied to detect and mitigate bias, and investigate the barriers preventing XAI methods from being used more widely in tackling fairness issues.