CL AISep 5, 2025

No Translation Needed: Forecasting Quality from Fertility and Metadata

Jessica M. Lundin, Ada Zhang, David Adelani, Cody Carroll

arXiv:2509.05425v12.7h-index: 4

Originality Synthesis-oriented

AI Analysis

This provides a method for multilingual evaluation and quality estimation, but it is incremental as it applies existing techniques to a new task.

The paper tackled the problem of predicting translation quality without running the translation system, using features like token fertility and linguistic metadata, achieving R² scores of 0.66 and 0.72 for different translation directions on the FLORES-200 benchmark.

We show that translation quality can be predicted with surprising accuracy \textit{without ever running the translation system itself}. Using only a handful of features, token fertility ratios, token counts, and basic linguistic metadata (language family, script, and region), we can forecast ChrF scores for GPT-4o translations across 203 languages in the FLORES-200 benchmark. Gradient boosting models achieve favorable performance ($R^{2}=0.66$ for XX$\rightarrow$English and $R^{2}=0.72$ for English$\rightarrow$XX). Feature importance analyses reveal that typological factors dominate predictions into English, while fertility plays a larger role for translations into diverse target languages. These findings suggest that translation quality is shaped by both token-level fertility and broader linguistic typology, offering new insights for multilingual evaluation and quality estimation.

View on arXiv PDF

Similar