CLNov 2, 2019

Machine Translation Evaluation using Bi-directional Entailment

arXiv:1911.00681v17 citations
Originality Incremental advance
AI Analysis

This addresses the need for more accurate evaluation metrics in machine translation, offering an incremental improvement over existing methods.

The paper tackles the problem of evaluating machine translation by proposing a new metric based on bi-directional entailment, which uses deep learning to assess semantic similarity instead of n-gram overlap, and shows it achieves better correlation with human scores on WMT datasets compared to traditional metrics like BLEU.

In this paper, we propose a new metric for Machine Translation (MT) evaluation, based on bi-directional entailment. We show that machine generated translation can be evaluated by determining paraphrasing with a reference translation provided by a human translator. We hypothesize, and show through experiments, that paraphrasing can be detected by evaluating entailment relationship in the forward and backward direction. Unlike conventional metrics, like BLEU or METEOR, our approach uses deep learning to determine the semantic similarity between candidate and reference translation for generating scores rather than relying upon simple n-gram overlap. We use BERT's pre-trained implementation of transformer networks, fine-tuned on MNLI corpus, for natural language inferencing. We apply our evaluation metric on WMT'14 and WMT'17 dataset to evaluate systems participating in the translation task and find that our metric has a better correlation with the human annotated score compared to the other traditional metrics at system level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes