CLAIOct 27, 2025

Quality-Aware Translation Tagging in Multilingual RAG system

arXiv:2510.23070v14 citationsh-index: 1Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Originality Incremental advance
AI Analysis

This addresses the issue of factual distortion and hallucinations in mRAG for low-resource language settings, offering a practical solution, though it appears incremental as it builds on existing mRAG frameworks.

The paper tackles the problem of poor translation quality degrading performance in multilingual Retrieval-Augmented Generation (mRAG) systems by proposing QTT-RAG, which evaluates translation quality along three dimensions and attaches scores as metadata, outperforming baselines like CrossRAG and DKM-RAG in open-domain QA benchmarks for low-resource languages.

Multilingual Retrieval-Augmented Generation (mRAG) often retrieves English documents and translates them into the query language for low-resource settings. However, poor translation quality degrades response generation performance. Existing approaches either assume sufficient translation quality or utilize the rewriting method, which introduces factual distortion and hallucinations. To mitigate these problems, we propose Quality-Aware Translation Tagging in mRAG (QTT-RAG), which explicitly evaluates translation quality along three dimensions-semantic equivalence, grammatical accuracy, and naturalness&fluency-and attach these scores as metadata without altering the original content. We evaluate QTT-RAG against CrossRAG and DKM-RAG as baselines in two open-domain QA benchmarks (XORQA, MKQA) using six instruction-tuned LLMs ranging from 2.4B to 14B parameters, covering two low-resource languages (Korean and Finnish) and one high-resource language (Chinese). QTT-RAG outperforms the baselines by preserving factual integrity while enabling generator models to make informed decisions based on translation reliability. This approach allows for effective usage of cross-lingual documents in low-resource settings with limited native language documents, offering a practical and robust solution across multilingual domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes