CLOct 12, 2020

Gender Coreference and Bias Evaluation at WMT 2020

Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky

arXiv:2010.06018v131.1999 citations

Originality Synthesis-oriented

AI Analysis

This work highlights harmful gender bias in widely used translation systems, which is incremental as it extends existing evaluation methods to new languages.

The study evaluated gender bias in machine translation across 19 systems for four languages, finding that all systems relied on spurious gender correlations instead of contextual cues.

Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.

View on arXiv PDF

Similar