CLNov 14, 2022

Does Debiasing Inevitably Degrade the Model Performance

arXiv:2211.07350v23 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the issue of social justice in AI by providing a method to reduce gender bias without harming model performance, which is an incremental improvement over existing debiasing techniques.

The paper tackles the problem of gender bias in language models and the performance degradation caused by debiasing methods, proposing a theoretical framework and a causality-detection fine-tuning approach that partially mitigates bias while avoiding performance degradation.

Gender bias in language models has attracted sufficient attention because it threatens social justice. However, most of the current debiasing methods degraded the model's performance on other tasks while the degradation mechanism is still mysterious. We propose a theoretical framework explaining the three candidate mechanisms of the language model's gender bias. We use our theoretical framework to explain why the current debiasing methods cause performance degradation. We also discover a pathway through which debiasing will not degrade the model performance. We further develop a causality-detection fine-tuning approach to correct gender bias. The numerical experiment demonstrates that our method is able to lead to double dividends: partially mitigating gender bias while avoiding performance degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes