CLCYJan 13, 2024

MiTTenS: A Dataset for Evaluating Gender Mistranslation

DeepMind
arXiv:2401.06935v326 citationsh-index: 17EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the harmful errors in translation systems for users across diverse languages, though it is incremental as it focuses on dataset creation for evaluation.

The authors tackled the problem of gender mistranslation in translation systems by introducing MiTTenS, a dataset covering 26 languages, and found that all evaluated systems exhibit gender mistranslation and potential harm, even in high-resource languages.

Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslation, and such errors can be especially harmful. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally under-represented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both neural machine translation systems and foundation models, and show that all systems exhibit gender mistranslation and potential harm, even in high resource languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes