Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine Translation
This addresses gender bias in machine translation for German, which is incremental as it extends an existing method to a new language.
The authors tackled gender bias in German machine translation by creating WinoMTDE, a test set to evaluate occupational stereotyping, and found persistent bias in most models, with a large language model outperforming traditional systems.
We present WinoMTDE, a new gender bias evaluation test set designed to assess occupational stereotyping and underrepresentation in German machine translation (MT) systems. Building on the automatic evaluation method introduced by arXiv:1906.00591v1, we extend the approach to German, a language with grammatical gender. The WinoMTDE dataset comprises 288 German sentences that are balanced in regard to gender, as well as stereotype, which was annotated using German labor statistics. We conduct a large-scale evaluation of five widely used MT systems and a large language model. Our results reveal persistent bias in most models, with the LLM outperforming traditional systems. The dataset and evaluation code are publicly available under https://github.com/michellekappl/mt_gender_german.