CL LGSep 24, 2020

Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

Ana Valeria Gonzalez, Maria Barrett, Rasmus Hvingelby, Kellie Webster, Anders Søgaard

arXiv:2009.11982v231.11001 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses gender bias in NLP for multilingual contexts, offering a more precise testbed than English-only studies, though it is incremental in extending bias detection to specific linguistic phenomena.

The authors tackled the issue of gender bias in NLP by constructing a multilingual, multi-task challenge dataset for languages with type B reflexivization, such as Swedish and Russian, which leads to unambiguously wrong model predictions; they found evidence for gender bias across all task-language combinations and correlated it with national labor market statistics.

The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are "hallucinatory", e.g., disambiguating gender-ambiguous occurrences of 'doctor' as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of 'the doctor removed his mask' is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.

View on arXiv PDF Code

Similar