CLOct 14, 2022

Controlling Bias Exposure for Fair Interpretable Predictions

Zexue He, Yu Wang, Julian McAuley, Bodhisattwa Prasad Majumder

arXiv:2210.07455v224.5297 citationsh-index: 72Has Code

Originality Incremental advance

AI Analysis

This addresses fairness issues in NLP for applications like hiring or content generation, but it is incremental as it builds on prior work on bias mitigation.

The paper tackles the problem of bias in NLP models by proposing a debiasing algorithm that uses sensitive information fairly rather than eliminating it, achieving a desirable trade-off between debiasing and task performance in text classification and generation tasks.

Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information 'fairly', rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjusting the predictive model's belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.

View on arXiv PDF Code

Similar