CLJul 16, 2025

Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese

arXiv:2507.12260v22 citationsh-index: 11EMNLP
Originality Incremental advance
AI Analysis

This provides a generalizable and graded measure of translationese for natural language processing and machine translation evaluation, though it is incremental as it builds on existing binary classification approaches.

The paper tackles the problem of measuring translationese, proposing a graded metric called translationese-index (T-index) using likelihood ratios from fine-tuned language models, which generalizes across genres, authors, and language pairs and aligns with human judgments using only 1-5k synthetic data pairs.

Translationese refers to linguistic properties that usually occur in translated texts. Previous works study translationese by framing it as a binary classification between original texts and translated texts. In this paper, we argue that translationese should be graded instead of binary and propose the first measure for translationese -- the translationese-index (T-index), computed from the likelihood ratios of two contrastively fine-tuned language models (LMs). We use synthesized translations and translations in the wild to evaluate T-index's generalizability in cross-domain settings and its validity against human judgments. Our results show that T-index can generalize to unseen genres, authors, and language pairs. Moreover, T-index computed using two 0.5B LMs fine-tuned on only 1-5k pairs of synthetic data can effectively capture translationese, as demonstrated by alignment with human pointwise ratings and pairwise judgments. Additionally, the correlation between T-index and existing machine translation (MT) quality estimation (QE) metrics such as BLEU and COMET is low, suggesting that T-index is not covered by these metrics and can serve as a complementary metric in MT QE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes