Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task
This work addresses the need for accurate quality estimation in machine translation, particularly for multilingual and specific language pairs, though it is incremental as it builds on existing pre-trained models and competition data.
The paper tackled the problem of sentence-level quality estimation for machine translation by introducing the UniTE framework, which combined three input formats with pre-trained language models and achieved first overall ranking in Multilingual and English-Russian settings and second in English-German and Chinese-English at the WMT 2022 competition.
In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase. Notably, to reduce the gap between pre-training and fine-tuning, we use data pruning and a ranking-based score normalization strategy. For the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Finally, we collect the source-only evaluation results, and ensemble the predictions generated by two UniTE models, whose backbones are XLM-R and InfoXLM, respectively. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings, showing relatively strong performances in this year's quality estimation competition.