How do Large Language Models Handle Multilingualism?
This work addresses the problem of understanding and enhancing multilingual capabilities in LLMs for AI researchers and practitioners, offering an incremental method to improve performance without compromising other languages.
The study investigates how large language models (LLMs) process multilingual inputs, proposing a workflow where LLMs convert queries to English for task-solving and then generate responses in the original language, and validates this with a neuron detection method that enables targeted fine-tuning, achieving average improvements of 3.6% for high-resource and 2.3% for low-resource languages using only 400 documents.
Large language models (LLMs) have demonstrated impressive capabilities across diverse languages. This study explores how LLMs handle multilingualism. Based on observed language ratio shifts among layers and the relationships between network structures and certain capabilities, we hypothesize the LLM's multilingual workflow ($\texttt{MWork}$): LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures, respectively. In the final layers, LLMs generate responses aligned with the original language of the query. To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. Using $\texttt{PLND}$, we validate $\texttt{MWork}$ through extensive experiments involving the deactivation of language-specific neurons across various layers and structures. Moreover, $\texttt{MWork}$ allows fine-tuning of language-specific neurons with a small dataset, enhancing multilingual abilities in a specific language without compromising others. This approach results in an average improvement of $3.6\%$ for high-resource languages and $2.3\%$ for low-resource languages across all tasks with just $400$ documents.