Self-Augmented In-Context Learning for Unsupervised Word Translation
This addresses the challenge of improving LLM performance in unsupervised bilingual lexicon induction for various language pairs, representing a novel method for a known bottleneck.
The paper tackled the problem of unsupervised word translation without seed pairs, especially for lower-resource languages, by proposing self-augmented in-context learning (SAIL), which iteratively induces high-confidence translation pairs from an LLM and reapplies them, resulting in substantial gains over zero-shot prompting and outperforming mapping-based baselines across benchmarks.
Recent work has shown that, while large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups, they still cannot match the performance of 'traditional' mapping-based approaches in the unsupervised scenario where no seed translation pairs are available, especially for lower-resource languages. To address this challenge with LLMs, we propose self-augmented in-context learning (SAIL) for unsupervised BLI: starting from a zero-shot prompt, SAIL iteratively induces a set of high-confidence word translation pairs for in-context learning (ICL) from an LLM, which it then reapplies to the same LLM in the ICL fashion. Our method shows substantial gains over zero-shot prompting of LLMs on two established BLI benchmarks spanning a wide range of language pairs, also outperforming mapping-based baselines across the board. In addition to achieving state-of-the-art unsupervised BLI performance, we also conduct comprehensive analyses on SAIL and discuss its limitations.