CLAug 13, 2018

Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

arXiv:1808.04164v132.01096 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of evaluating pronoun translation accuracy for NLP researchers, but it is incremental as it critiques existing metrics without introducing new methods.

The study compared APT and AutoPRF automated metrics for pronoun translation against human judgments on the PROTEST test suite, finding only some correlation and identifying performance limitations. The authors recommend using semi-automatic metrics and test suites instead of fully automatic ones.

We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.

View on arXiv PDF

Similar