Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese
This provides a generalizable and graded measure of translationese for natural language processing and machine translation evaluation, though it is incremental as it builds on existing binary classification approaches.
The paper tackles the problem of measuring translationese, proposing a graded metric called translationese-index (T-index) using likelihood ratios from fine-tuned language models, which generalizes across genres, authors, and language pairs and aligns with human judgments using only 1-5k synthetic data pairs.
Translationese refers to linguistic properties that usually occur in translated texts. Previous works study translationese by framing it as a binary classification between original texts and translated texts. In this paper, we argue that translationese should be graded instead of binary and propose the first measure for translationese -- the translationese-index (T-index), computed from the likelihood ratios of two contrastively fine-tuned language models (LMs). We use synthesized translations and translations in the wild to evaluate T-index's generalizability in cross-domain settings and its validity against human judgments. Our results show that T-index can generalize to unseen genres, authors, and language pairs. Moreover, T-index computed using two 0.5B LMs fine-tuned on only 1-5k pairs of synthetic data can effectively capture translationese, as demonstrated by alignment with human pointwise ratings and pairwise judgments. Additionally, the correlation between T-index and existing machine translation (MT) quality estimation (QE) metrics such as BLEU and COMET is low, suggesting that T-index is not covered by these metrics and can serve as a complementary metric in MT QE.