CL AI HCMay 29, 2025

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza

arXiv:2505.23183v16.72 citationsh-index: 12Has CodeEMNLP

Originality Incremental advance

AI Analysis

This work addresses the need for efficient quality estimation in machine translation to assist translators, though it is incremental as it builds on existing interpretability techniques.

The paper tackled the problem of word-level quality estimation for machine translation by developing unsupervised methods using language model interpretability and uncertainty quantification, achieving results that highlight the potential of these metrics and the issues with supervised methods under label uncertainty across 14 metrics and 12 translation directions.

Word-level quality estimation (WQE) aims to automatically identify fine-grained error spans in machine-translated outputs and has found many uses, including assisting translators during post-editing. Modern WQE techniques are often expensive, involving prompting of large language models or ad-hoc training on large amounts of human-labeled data. In this work, we investigate efficient alternatives exploiting recent advances in language model interpretability and uncertainty quantification to identify translation errors from the inner workings of translation models. In our evaluation spanning 14 metrics across 12 translation directions, we quantify the impact of human label variation on metric performance by using multiple sets of human labels. Our results highlight the untapped potential of unsupervised metrics, the shortcomings of supervised methods when faced with label uncertainty, and the brittleness of single-annotator evaluation practices.

View on arXiv PDF Code

Similar