A Test Suite and Manual Evaluation of Document-Level NMT at WMT19
This work addresses the difficulty in evaluating machine translation for researchers and practitioners as systems advance to document-level tasks, though it is incremental as it builds on existing evaluation methods.
The authors tackled the challenge of evaluating document-level neural machine translation by creating a test suite for WMT19 to assess discourse phenomena, and they manually analyzed outputs to identify relevant translation errors.
As the quality of machine translation rises and neural machine translation (NMT) is moving from sentence to document level translations, it is becoming increasingly difficult to evaluate the output of translation systems. We provide a test suite for WMT19 aimed at assessing discourse phenomena of MT systems participating in the News Translation Task. We have manually checked the outputs and identified types of translation errors that are relevant to document-level translation.