Beyond Many-Shot Translation: Scaling In-Context Demonstrations For Low-Resource Machine Translation
This work addresses the challenge of adapting large language models for low-resource languages in machine translation, but it is incremental as it builds on existing in-context learning methods.
The study explored scaling in-context learning for low-resource machine translation beyond few-shot settings to thousands of examples, finding that gains from additional context saturate quickly and can degrade near maximum context windows, with performance strongly dependent on corpus type, and some monolingual supervision can be competitive with parallel data.
Building machine translation (MT) systems for low-resource languages is notably difficult due to the scarcity of high-quality data. Although Large Language Models (LLMs) have improved MT system performance, adapting them to lesser-represented languages remains challenging. In-context learning (ICL) may offer novel ways to adapt LLMs for low-resource MT by conditioning models on demonstration at inference time. In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models. We scale in-context token budget to 1M tokens and compare three types of training corpora used as in-context supervision: monolingual unsupervised data, instruction-style data, and parallel data (English--target and Indonesian--target). Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly and can degrade near the maximum context window, with scaling behavior strongly dependent on corpus type. Notably, some forms of monolingual supervision can be competitive with parallel data, despite the latter offering additional supervision. Overall, our results characterize the effective limits and corpus-type sensitivity of long-context ICL for low-resource MT, highlighting that larger context windows do not necessarily yield proportional quality gains.