CLSep 13, 2021

Fine Grained Human Evaluation for English-to-Chinese Machine Translation: A Case Study on Scientific Text

arXiv:2110.14766v10.2

Originality Synthesis-oriented

AI Analysis

This highlights the gap in machine translation performance for professional domains like scientific text, emphasizing the need for domain-specific research and resources.

The paper conducted a fine-grained human evaluation of four Chinese-English neural machine translation systems on scientific abstracts, finding that all systems had over 10% error rates on average, requiring significant post-editing for academic use.

Recent research suggests that neural machine translation (MT) in the news domain has reached human-level performance, but for other professional domains, it is far below the level. In this paper, we conduct a fine-grained systematic human evaluation for four widely used Chinese-English NMT systems on scientific abstracts which are collected from published journals and books. Our human evaluation results show that all the systems return with more than 10\% error rates on average, which requires much post editing effort for real academic use. Furthermore, we categorize six main error types and and provide some real examples. Our findings emphasise the needs that research attention in the MT community should be shifted from short text generic translation to professional machine translation and build large scale bilingual corpus for these specific domains.

View on arXiv PDF

Similar