CLFeb 21, 2022

USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation

arXiv:2202.10062v3270 citations
Originality Highly original
AI Analysis

This addresses the need for evaluation metrics in machine translation when human scores, references, or parallel data are unavailable, offering a novel approach that is not incremental.

The paper tackles the problem of evaluating machine translation without supervision by developing fully unsupervised metrics that leverage similarities between evaluation, parallel corpus mining, and MT systems, resulting in metrics that outperform supervised competitors on 4 out of 5 datasets.

The vast majority of evaluation metrics for machine translation are supervised, i.e., (i) are trained on human scores, (ii) assume the existence of reference translations, or (iii) leverage parallel data. This hinders their applicability to cases where such supervision signals are not available. In this work, we develop fully unsupervised evaluation metrics. To do so, we leverage similarities and synergies between evaluation metric induction, parallel corpus mining, and MT systems. In particular, we use an unsupervised evaluation metric to mine pseudo-parallel data, which we use to remap deficient underlying vector spaces (in an iterative manner) and to induce an unsupervised MT system, which then provides pseudo-references as an additional component in the metric. Finally, we also induce unsupervised multilingual sentence embeddings from pseudo-parallel data. We show that our fully unsupervised metrics are effective, i.e., they beat supervised competitors on 4 out of our 5 evaluation datasets. We make our code publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes