CLAug 21, 2018

Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

arXiv:1808.07048v11164 citations
Originality Incremental advance
AI Analysis

This highlights the need for document-level evaluation in machine translation as improvements make sentence-level errors less detectable, impacting researchers and practitioners in NLP.

The paper tested the claim that neural machine translation achieves human parity by comparing human and machine translations at both sentence and document levels, finding that human raters preferred human translations more strongly at the document level.

Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese--English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes