MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language
This addresses a gap in MT evaluation for figurative language, which is incremental as it builds on existing methods by focusing on a specific aspect.
The paper tackles the problem of evaluating machine translation quality for metaphorical language, proposing new human evaluation metrics and a multilingual parallel metaphor corpus, and finds that translations of figurative expressions differ from literal ones.
Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. We additionally present a multilingual parallel metaphor corpus generated by post-editing. Our evaluation protocol is designed to estimate four aspects of MT: Metaphorical Equivalence, Emotion, Authenticity, and Quality. In doing so, we observe that translations of figurative expressions display different traits from literal ones.