Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
This work addresses data scarcity for machine translation quality estimation, which is crucial for developing reward models in translation tasks, though it appears incremental as it builds on existing synthetic data methods.
The paper tackles the problem of distribution shift in synthetic data for machine translation quality estimation by introducing DCSQE, a framework that uses constrained beam search and diverse generation models to align pseudo and real translations, resulting in outperforming state-of-the-art baselines like CometKiwi in supervised and unsupervised settings.
Quality Estimation (QE) models evaluate the quality of machine translations without reference translations, serving as the reward models for the translation task. Due to the data scarcity, synthetic data generation has emerged as a promising solution. However, synthetic QE data often suffers from distribution shift, which can manifest as discrepancies between pseudo and real translations, or in pseudo labels that do not align with human preferences. To tackle this issue, we introduce DCSQE, a novel framework for alleviating distribution shift in synthetic QE data. To reduce the difference between pseudo and real translations, we employ the constrained beam search algorithm and enhance translation diversity through the use of distinct generation models. DCSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes, enhancing the quality of token-level labels. DCSQE further identifies the shortest phrase covering consecutive error tokens, mimicking human annotation behavior, to assign the final phrase-level labels. Specially, we underscore that the translation model can not annotate translations of itself accurately. Extensive experiments demonstrate that DCSQE outperforms SOTA baselines like CometKiwi in both supervised and unsupervised settings. Further analysis offers insights into synthetic data generation that could benefit reward models for other tasks. The code is available at https://github.com/NJUNLP/njuqe.