CL AIJul 11, 2024

fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

Jinfeng Li, Yuefeng Chen, Xiangyu Liu, Longtao Huang, Rong Zhang, Hui Xue

arXiv:2407.08189v11.01 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses fairness issues in fine-tuned BERT models, which is crucial for ethical AI applications, though it is incremental as it builds on existing adversarial methods.

The paper tackled the problem of stereotypical biases in pre-trained language models by proposing fairBERTs, a framework that uses semantic and fairness-aware perturbations generated by a generative adversarial network to erase sensitive information during fine-tuning, resulting in significant fairness improvements while maintaining model utility as demonstrated in experiments on two real-world tasks.

Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework for learning fair fine-tuned BERT series models by erasing the protected sensitive information via semantic and fairness-aware perturbations generated by a generative adversarial network. Through extensive qualitative and quantitative experiments on two real-world tasks, we demonstrate the great superiority of fairBERTs in mitigating unfairness while maintaining the model utility. We also verify the feasibility of transferring adversarial components in fairBERTs to other conventionally trained BERT-like models for yielding fairness improvements. Our findings may shed light on further research on building fairer fine-tuned PLMs.

View on arXiv PDF

Similar