CLNov 2, 2022

MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation

Anna Currey, Maria Nădejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu

Amazon

arXiv:2211.01355v125.0305 citationsh-index: 21Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for fine-grained evaluation in machine translation, particularly for gender accuracy, which impacts fluency, accuracy, and ethics, though it is incremental as it builds on existing benchmarks.

The authors tackled the problem of evaluating gender accuracy in machine translation by introducing MT-GenEval, a benchmark dataset for English into eight languages, resulting in a publicly available resource with realistic, gender-balanced, and counterfactual data.

As generic machine translation (MT) quality has improved, the need for targeted benchmarks that explore fine-grained aspects of quality has increased. In particular, gender accuracy in translation can have implications in terms of output fluency, translation accuracy, and ethics. In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight widely-spoken languages. MT-GenEval complements existing benchmarks by providing realistic, gender-balanced, counterfactual data in eight language pairs where the gender of individuals is unambiguous in the input segment, including multi-sentence segments requiring inter-sentential gender agreement. Our data and code is publicly available under a CC BY SA 3.0 license.

View on arXiv PDF Code

Similar