CLAIDec 11, 2024

The Roles of English in Evaluating Multilingual Language Models

arXiv:2412.08392v116 citationsh-index: 15NoDaLiDa/Baltic-HLT
AI Analysis

This addresses the problem of imprecise evaluation practices in multilingual NLP for researchers and practitioners, but it is incremental as it critiques existing methods without introducing new solutions.

The paper identifies two roles of English in multilingual language model evaluations—as an interface and as a natural language—and argues that this discrepancy confuses task performance with language understanding, recommending a shift toward methods that better assess language understanding.

Multilingual natural language processing is getting increased attention, with numerous models, benchmarks, and methods being released for many languages. English is often used in multilingual evaluation to prompt language models (LMs), mainly to overcome the lack of instruction tuning data in other languages. In this position paper, we lay out two roles of English in multilingual LM evaluations: as an interface and as a natural language. We argue that these roles have different goals: task performance versus language understanding. This discrepancy is highlighted with examples from datasets and evaluation setups. Numerous works explicitly use English as an interface to boost task performance. We recommend to move away from this imprecise method and instead focus on furthering language understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes