CLAINov 16, 2022

MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

arXiv:2211.08633v2227 citationsh-index: 48
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient evaluation in simultaneous speech translation by validating existing metrics, though it is incremental as it applies known methods to a new context.

The study investigated whether offline machine translation metrics correlate with human ratings for simultaneous speech translation, finding that metrics like BLEU and COMET are well-correlated with Continuous Ratings and can serve as proxies for human evaluation, with correlations higher when using translation rather than interpreting as reference.

There have been several meta-evaluation studies on the correlation between human ratings and offline machine translation (MT) evaluation metrics such as BLEU, chrF2, BertScore and COMET. These metrics have been used to evaluate simultaneous speech translation (SST) but their correlations with human ratings of SST, which has been recently collected as Continuous Ratings (CR), are unclear. In this paper, we leverage the evaluations of candidate systems submitted to the English-German SST task at IWSLT 2022 and conduct an extensive correlation analysis of CR and the aforementioned metrics. Our study reveals that the offline metrics are well correlated with CR and can be reliably used for evaluating machine translation in simultaneous mode, with some limitations on the test set size. We conclude that given the current quality levels of SST, these metrics can be used as proxies for CR, alleviating the need for large scale human evaluation. Additionally, we observe that correlations of the metrics with translation as a reference is significantly higher than with simultaneous interpreting, and thus we recommend the former for reliable evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes