CLAINov 15, 2023

How Vocabulary Sharing Facilitates Multilingualism in LLaMA?

CMU
arXiv:2311.09071v234 citationsh-index: 60Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of limited multilingual performance in LLMs for researchers and practitioners, offering incremental improvements through targeted tuning strategies.

The study investigated how vocabulary sharing affects multilingual capabilities in Large Language Models (LLaMA) by analyzing 101 languages, revealing that existing models have unexpectedly strong multilingual abilities and providing guidelines to significantly improve performance based on identified quadrants.

Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM's multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant~\footnote{\url{https://github.com/CONE-MT/Vocabulary-Sharing-Facilitates-Multilingualism}.}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes