On the Similarities Between Native, Non-native and Translated Texts
This research addresses the linguistic analysis of text varieties, providing insights for computational linguistics and language processing, but it is incremental as it builds on existing methods for text comparison.
The study tackled the problem of comparing native, non-native, and translated texts using computational methods, finding that these text types are easily distinguishable and that non-native and translated texts are more similar to each other than to native texts.
We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.