Discourse Cohesion Evaluation for Document-Level Neural Machine Translation
This addresses the need for better evaluation metrics in document-level NMT, though it is incremental as it builds on existing cohesion concepts.
The paper tackles the problem that sentence-level metrics like BLEU fail to evaluate document-level neural machine translation (NMT) by proposing a Discourse Cohesion Evaluation Method (DCoEM) with a test suite based on four cohesive manners, showing it is practical and essential for assessing document translations.
It is well known that translations generated by an excellent document-level neural machine translation (NMT) model are consistent and coherent. However, existing sentence-level evaluation metrics like BLEU can hardly reflect the model's performance at the document level. To tackle this issue, we propose a Discourse Cohesion Evaluation Method (DCoEM) in this paper and contribute a new test suite that considers four cohesive manners (reference, conjunction, substitution, and lexical cohesion) to measure the cohesiveness of document translations. The evaluation results on recent document-level NMT systems show that our method is practical and essential in estimating translations at the document level.