CLAug 11, 2025

Preliminary Ranking of WMT25 General Machine Translation Systems

Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Natalia Fedorova, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz

ETH ZurichMicrosoft

arXiv:2508.14909v24.93 citationsh-index: 45

Originality Synthesis-oriented

AI Analysis

This provides incremental, temporary guidance for participants in a specific machine translation competition.

The paper presents preliminary rankings of machine translation systems from the WMT25 shared task using automatic metrics, noting potential biases toward re-ranking techniques, with the final rankings to be based on human evaluation.

We present the preliminary rankings of machine translation (MT) systems submitted to the WMT25 General Machine Translation Shared Task, as determined by automatic evaluation metrics. Because these rankings are derived from automatic evaluation, they may exhibit a bias toward systems that employ re-ranking techniques, such as Quality Estimation or Minimum Bayes Risk decoding. The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results. The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results. The purpose of releasing these findings now is to assist task participants with their system description papers; not to provide final findings.

View on arXiv PDF

Similar