CLMar 4, 2024

Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?

Evgeniia Razumovskaia, Ivan Vulić, Anna Korhonen

arXiv:2403.01929v17.715 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the problem of selecting efficient few-shot learning methods for multilingual NLU, providing practical insights for researchers and practitioners, though it is incremental in comparing existing approaches.

The paper systematically compares supervised fine-tuning, supervised instruction tuning, and in-context learning for few-shot multilingual NLU across 6 languages and 3 tasks, finding that supervised instruction tuning offers the best performance-resource trade-off, with target language adaptation improving generation but not understanding, especially for low-resource languages where scores remain low.

Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning. ICL has gained popularity recently with the advent of LLMs due to its simplicity and sample efficiency. Prior research has conducted only limited investigation into how these approaches work for multilingual few-shot learning, and the focus so far has been mostly on their performance. In this work, we present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups. Importantly, performance is only one aspect of the comparison, where we also analyse the approaches through the optics of their computational, inference and financial costs. Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements. As another contribution, we analyse the impact of target language adaptation of pretrained LLMs and find that the standard adaptation approaches can (superficially) improve target language generation capabilities, but language understanding elicited through ICL does not improve and remains limited, with low scores especially for low-resource languages.

View on arXiv PDF

Similar