CLCYJun 15, 2024

Multilingual Large Language Models and Curse of Multilinguality

arXiv:2406.10602v217 citations
Originality Synthesis-oriented
AI Analysis

It offers a foundational resource for NLP researchers and practitioners by surveying existing models and challenges, but it is incremental as it primarily reviews and synthesizes current knowledge without introducing new methods or results.

This paper provides an introductory overview of multilingual large language models, explaining their technical aspects and addressing the curse of multilinguality as a significant limitation.

Multilingual Large Language Models (LLMs) have gained large popularity among Natural Language Processing (NLP) researchers and practitioners. These models, trained on huge datasets, show proficiency across various languages and demonstrate effectiveness in numerous downstream tasks. This paper navigates the landscape of multilingual LLMs, providing an introductory overview of their technical aspects. It explains underlying architectures, objective functions, pre-training data sources, and tokenization methods. This work explores the unique features of different model types: encoder-only (mBERT, XLM-R), decoder-only (XGLM, PALM, BLOOM, GPT-3), and encoder-decoder models (mT5, mBART). Additionally, it addresses one of the significant limitations of multilingual LLMs - the curse of multilinguality - and discusses current attempts to overcome it.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes