CL AI LGNov 4, 2024

Code-Switching Curriculum Learning for Multilingual Transfer in LLMs

Haneul Yoo, Cheonbok Park, Sangdoo Yun, Alice Oh, Hwaran Lee

arXiv:2411.02460v212.927 citationsh-index: 9ACL

Originality Incremental advance

AI Analysis

This addresses the issue of multilingual transfer imbalance in LLMs, particularly benefiting low-resource languages, though it is incremental as it builds on existing curriculum learning and code-switching ideas.

The paper tackles the problem of performance drop in large language models (LLMs) for languages beyond high-resource ones by proposing code-switching curriculum learning (CSCL), which mimics human second language acquisition through progressive training stages, resulting in significant performance gains for languages like Korean compared to monolingual methods.

Large language models (LLMs) now exhibit near human-level performance in various tasks, but their performance drops drastically after a handful of high-resource languages due to the imbalance in pre-training data. Inspired by the human process of second language acquisition, particularly code-switching$\unicode{x2014}$the practice of language alternation in a conversation$\unicode{x2014}$we propose code-switching curriculum learning (CSCL) to enhance cross-lingual transfer for LLMs. CSCL mimics the stages of human language learning by progressively training models with a curriculum consisting of 1) token-level code-switching, 2) sentence-level code-switching, and 3) monolingual corpora. Using Qwen 2 as our underlying model, we demonstrate the efficacy of the CSCL in improving language transfer to Korean, achieving significant performance gains compared to monolingual continual pre-training methods. Ablation studies reveal that both token- and sentence-level code-switching significantly enhance cross-lingual transfer and that curriculum learning amplifies these effects. We also extend our findings into various languages, including Japanese (high-resource) and Indonesian (low-resource), and using two additional models (Gemma 2 and Phi 3.5). We further show that CSCL mitigates spurious correlations between language resources and safety alignment, presenting a robust, efficient framework for more equitable language transfer in LLMs. We observe that CSCL is effective for low-resource settings where high-quality, monolingual corpora for language transfer are hardly available.

View on arXiv PDF

Similar