Regressive Ensemble for Machine Translation Quality Evaluation
This work addresses the need for more accurate and robust machine translation evaluation metrics, particularly for unseen languages, though it is incremental as it builds on existing metrics.
The authors tackled the problem of evaluating machine translation quality by introducing a regressive ensemble that combines novel and established metrics, achieving significant performance improvements over single metrics in both monolingual and zero-shot cross-lingual settings, with a strong reference-free baseline outperforming BLEU and METEOR.
This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics. We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop. In both monolingual and zero-shot cross-lingual settings, we show a significant performance improvement over single metrics. In the cross-lingual settings, we also demonstrate that an ensemble approach is well-applicable to unseen languages. Furthermore, we identify a strong reference-free baseline that consistently outperforms the commonly-used BLEU and METEOR measures and significantly improves our ensemble's performance.