Proverbs Run in Pairs: Evaluating Proverb Translation Capability of Large Language Model
This addresses the challenge of culturally-aware translation for NLP researchers and practitioners, but it is incremental as it focuses on a specific linguistic aspect without major methodological breakthroughs.
The paper tackled the problem of translating proverbs, a cultural element, using neural machine translation and large language models, finding that LLMs generally outperform NMT models, especially for languages with similar cultural backgrounds, and that current automatic metrics like BLEU are inadequate for evaluation.
Despite achieving remarkable performance, machine translation (MT) research remains underexplored in terms of translating cultural elements in languages, such as idioms, proverbs, and colloquial expressions. This paper investigates the capability of state-of-the-art neural machine translation (NMT) and large language models (LLMs) in translating proverbs, which are deeply rooted in cultural contexts. We construct a translation dataset of standalone proverbs and proverbs in conversation for four language pairs. Our experiments show that the studied models can achieve good translation between languages with similar cultural backgrounds, and LLMs generally outperform NMT models in proverb translation. Furthermore, we find that current automatic evaluation metrics such as BLEU, CHRF++ and COMET are inadequate for reliably assessing the quality of proverb translation, highlighting the need for more culturally aware evaluation metrics.