Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation
It provides a comprehensive overview for researchers and practitioners working on machine translation, particularly for under-resourced settings, but is incremental as it synthesizes existing progress.
This survey reviews how Large Language Models (LLMs) are applied to machine translation, focusing on techniques like few-shot prompting and synthetic data generation to address low-resource languages, and compares their performance with traditional models while discussing challenges like hallucinations and biases.
The advent of Large Language Models (LLMs) has significantly reshaped the landscape of machine translation (MT), particularly for low-resource languages and domains that lack sufficient parallel corpora, linguistic tools, and computational infrastructure. This survey presents a comprehensive overview of recent progress in leveraging LLMs for MT. We analyze techniques such as few-shot prompting, cross-lingual transfer, and parameter-efficient fine-tuning (e.g., LoRA, adapters) that enable effective adaptation to under-resourced settings. The paper also explores synthetic data generation strategies using LLMs, including back-translation and lexical augmentation. Additionally, we compare LLM-based translation with traditional encoder-decoder models across diverse language pairs, highlighting the strengths and limitations of each. We discuss persistent challenges - such as hallucinations, evaluation inconsistencies, and inherited biases, while also evaluating emerging LLM-driven metrics for translation quality. This survey offers practical insights and outlines future directions for building robust, inclusive, and scalable MT systems in the era of large-scale generative models.