GATE: A Challenge Set for Gender-Ambiguous Translation Examples
This addresses bias in machine translation for users of translation systems, but is incremental as it builds on existing gender rewriter methods.
The authors tackled the problem of machine translation defaulting to stereotypical gender roles when source gender is ambiguous, by presenting GATE, a linguistically diverse corpus of gender-ambiguous sentences with multiple alternative translations, and used it to evaluate their translation rewriter system.
Although recent years have brought significant progress in improving translation of unambiguously gendered sentences, translation of ambiguously gendered input remains relatively unexplored. When source gender is ambiguous, machine translation models typically default to stereotypical gender roles, perpetuating harmful bias. Recent work has led to the development of "gender rewriters" that generate alternative gender translations on such ambiguous inputs, but such systems are plagued by poor linguistic coverage. To encourage better performance on this task we present and release GATE, a linguistically diverse corpus of gender-ambiguous source sentences along with multiple alternative target language translations. We also provide tools for evaluation and system analysis when using GATE and use them to evaluate our translation rewriter system.