An Automatic Machine Translation Evaluation Metric Based on Dependency Parsing Model
This work addresses the need for more accurate and efficient automatic evaluation metrics in machine translation, though it is incremental as it builds on existing syntax-based approaches.
The paper tackles the problem of syntax-based machine translation evaluation by proposing a novel metric based on dependency parsing models, which avoids human-defined sub-structures and redundant computations. It achieves state-of-the-art performance at the system level and is comparable to METEOR at the sentence level.
Most of the syntax-based metrics obtain the similarity by comparing the sub-structures extracted from the trees of hypothesis and reference. These sub-structures are defined by human and can't express all the information in the trees because of the limited length of sub-structures. In addition, the overlapped parts between these sub-structures are computed repeatedly. To avoid these problems, we propose a novel automatic evaluation metric based on dependency parsing model, with no need to define sub-structures by human. First, we train a dependency parsing model by the reference dependency tree. Then we generate the hypothesis dependency tree and the corresponding probability by the dependency parsing model. The quality of the hypothesis can be judged by this probability. In order to obtain the lexicon similarity, we also introduce the unigram F-score to the new metric. Experiment results show that the new metric gets the state-of-the-art performance on system level, and is comparable with METEOR on sentence level.