CLMay 12, 2020

Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019

arXiv:2005.05738v131.21004 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work critically evaluates benchmark claims in machine translation, highlighting methodological flaws for researchers and practitioners.

The paper reassesses claims of human parity and super-human performance in machine translation from WMT 2019, identifying issues in human evaluation and conducting a modified evaluation that refutes most claims, except for human parity in English-to-German.

We reassess the claims of human parity and super-human performance made at the news shared task of WMT 2019 for three translation directions: English-to-German, English-to-Russian and German-to-English. First we identify three potential issues in the human evaluation of that shared task: (i) the limited amount of intersentential context available, (ii) the limited translation proficiency of the evaluators and (iii) the use of a reference translation. We then conduct a modified evaluation taking these issues into account. Our results indicate that all the claims of human parity and super-human performance made at WMT 2019 should be refuted, except the claim of human parity for English-to-German. Based on our findings, we put forward a set of recommendations and open questions for future assessments of human parity in machine translation.

View on arXiv PDF Code

Similar