CLFeb 2, 2023

The unreasonable effectiveness of few-shot learning for machine translation

Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat

Stanford

arXiv:2302.01398v118.5135 citationsh-index: 46

Originality Highly original

AI Analysis

This addresses the problem of data efficiency and controllability in machine translation for both high and low-resource languages, representing a significant advance rather than an incremental improvement.

The paper tackles machine translation by showing that a few-shot learning approach with only 5 examples at inference can match state-of-the-art supervised and commercial systems, outperforming the best on WMT'21 English-Chinese task and enabling control over attributes like regional varieties and formality.

We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.

View on arXiv PDF

Similar