CLJun 19, 2019

The Effect of Translationese in Machine Translation Test Sets

arXiv:1906.08069v11121 citations
AI Analysis

This addresses a data bias issue in machine translation evaluation, which is incremental as it builds on prior work on translationese in training data.

The study investigated how translationese in test sets inflates human evaluation scores for machine translation systems and can alter system rankings, finding that its impact is inversely correlated with the translation quality achievable by state-of-the-art systems.

The effect of translationese has been studied in the field of machine translation (MT), mostly with respect to training data. We study in depth the effect of translationese on test data, using the test sets from the last three editions of WMT's news shared task, containing 17 translation directions. We show evidence that (i) the use of translationese in test sets results in inflated human evaluation scores for MT systems; (ii) in some cases system rankings do change and (iii) the impact translationese has on a translation direction is inversely correlated to the translation quality attainable by state-of-the-art MT systems for that direction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes