CLOct 14, 2022

InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

Bodhisattwa Prasad Majumder, Zexue He, Julian McAuley

arXiv:2210.07440v216.9135 citationsh-index: 72

Originality Incremental advance

AI Analysis

This addresses the challenge of subjective fairness in debiasing for NLP applications, offering an incremental approach that leverages human feedback to enhance interpretability and fairness.

The paper tackles the problem of debiasing NLP models by proposing an interactive method where users provide natural language feedback to achieve a fairer balance between task performance and bias mitigation, resulting in bias reductions of 5-8% and performance improvements of 4-5%.

Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (e.g., gender or race). We instead argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We explore two interactive setups with a frozen predictive model and show that users able to provide feedback can achieve a better and fairer balance between task performance and bias mitigation. In one setup, users, by interacting with test examples, further decreased bias in the explanations (5-8%) while maintaining the same prediction accuracy. In the other setup, human feedback was able to disentangle associated bias and predictive information from the input leading to superior bias mitigation and improved task performance (4-5%) simultaneously.

View on arXiv PDF

Similar