CL AIOct 17, 2024

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

Simone Conia, Daniel Lee, Min Li, Umar Farooq Minhas, Saloni Potdar, Yunyao Li

arXiv:2410.14057v117.945 citationsh-index: 24EMNLP

Originality Highly original

AI Analysis

This addresses the challenge of cross-cultural translation for texts containing entity names, which is important for improving translation quality in culturally-sensitive contexts, though it appears incremental in combining existing techniques.

The paper tackles the problem of machine translation for culturally-nuanced entity names by introducing XC-Translate, a new benchmark, and KG-MT, a method that integrates multilingual knowledge graphs via dense retrieval. KG-MT outperforms state-of-the-art approaches with 129% and 62% relative improvements over NLLB-200 and GPT-4, respectively.

Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.

View on arXiv PDF

Similar