CLJul 8, 2021

Using CollGram to Compare Formulaic Language in Human and Neural Machine Translation

arXiv:2107.03625v22 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of understanding linguistic differences in machine translation for researchers and practitioners, but it is incremental as it builds on existing comparisons of formulaic language.

The study compared formulaic language in human and neural machine translation of newspaper articles, finding that neural translations had fewer low-frequency but strongly-associated sequences and more high-frequency ones, with statistically significant differences and medium-to-large effect sizes.

A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences, and more high-frequency formulaic sequences. These differences were statistically significant and the effect sizes were almost always medium or large. These observations can be related to the differences between second language learners of various levels and between translated and untranslated texts. The comparison between the neural machine translation systems indicates that some systems produce more formulaic sequences of both types than other systems.

View on arXiv PDF

Similar