CLOct 21, 2024

Analyzing Context Contributions in LLM-based Machine Translation

Emmanouil Zaranis, Nuno M. Guerreiro, André F. T. Martins

arXiv:2410.16246v113.525 citationsh-index: 18EMNLP

Originality Incremental advance

AI Analysis

This work provides insights into the internal mechanisms of LLM-based machine translation, which could help improve translation quality and reliability for users of AI translation systems, though it is incremental in nature.

The paper analyzed how large language models use different parts of the input context, such as few-shot examples and source text, in machine translation, finding that source parts contribute more than targets, finetuning alters contribution patterns, and earlier examples have higher contributions, with potential to detect hallucinations.

Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of the input context remain largely unexplored. In this work, we provide a comprehensive analysis of context utilization in MT, studying how LLMs use various context parts, such as few-shot examples and the source text, when generating translations. We highlight several key findings: (1) the source part of few-shot examples appears to contribute more than its corresponding targets, irrespective of translation direction; (2) finetuning LLMs with parallel data alters the contribution patterns of different context parts; and (3) there is a positional bias where earlier few-shot examples have higher contributions to the translated sequence. Finally, we demonstrate that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations. Our findings shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.

View on arXiv PDF

Similar