CLAIFeb 22, 2024

Zero-shot cross-lingual transfer in instruction tuning of large language models

arXiv:2402.14778v226 citationsh-index: 15INLG
AI Analysis

This addresses the problem of multilingual instruction following for users needing AI assistance in non-English languages, though it is incremental as it builds on existing instruction tuning methods.

The study systematically investigates zero-shot cross-lingual transfer in instruction tuning of large language models, finding that models trained on English-only data can generate correct-language and helpful responses in other languages, but with low factuality and occasional fluency errors.

Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We advocate for the importance of evaluating various aspects of model responses in multilingual instruction following and investigate the influence of different model configuration choices. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in other languages, but suffer from low factuality and may occasionally have fluency errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes