CLOct 22, 2023

DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and Bias

arXiv:2310.14329v1138 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses the need for more comprehensive evaluation of gender bias mitigation in NLP, though it is incremental as it builds on existing debiasing work.

The authors tackled the problem of evaluating gender bias mitigation in language models by introducing DiFair, a benchmark that assesses both bias reduction and preservation of useful gender knowledge, finding that debiasing techniques often reduce bias but also lower gender knowledge.

Numerous debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models. These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions. Importantly, this evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge. To fill this gap, we propose DiFair, a manually curated dataset based on masked language modeling objectives. DiFair allows us to introduce a unified metric, gender invariance score, that not only quantifies a model's biased behavior, but also checks if useful gender knowledge is preserved. We use DiFair as a benchmark for a number of widely-used pretained language models and debiasing techniques. Experimental results corroborate previous findings on the existing gender biases, while also demonstrating that although debiasing techniques ameliorate the issue of gender bias, this improvement usually comes at the price of lowering useful gender knowledge of the model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes