CLOct 29, 2020

Unbabel's Participation in the WMT20 Metrics Shared Task

arXiv:2010.15535v1996 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of evaluating machine translation quality for researchers and practitioners, but it is incremental as it builds on existing frameworks.

The Unbabel team tackled the WMT 2020 Shared Task on Metrics by developing models based on the COMET framework, including estimator and ranking models, and a method for converting segment-level to document-level scores, achieving strong results and setting new state-of-the-art performance in many cases.

We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: We train several estimator models to regress on different human-generated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes