CLApr 14

Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark

Terra Blevins, Stephen Mayhew, Marek Šuppa, Hila Gonen, Shachar Mirkin, Vasile Pais, Kaja Dobrovoljc, Voula Giouli, Jun Kevin, Eugene Jang, Eungseo Kim, Jeongyeon Seo

arXiv:2604.1274494.7h-index: 22

Predicted impact top 14% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

It provides a gold-standard evaluation benchmark for multilingual NER, addressing the lack of such resources for most languages.

The paper introduces Universal NER v2, an expanded multilingual NER benchmark with standardized annotations across many languages, building on the first release in 2024.

While multilingual language models promise to bring the benefits of LLMs to speakers of many languages, gold-standard evaluation benchmarks in most languages to interrogate these assumptions remain scarce. The Universal NER project, now entering its fourth year, is dedicated to building gold-standard multilingual Named Entity Recognition (NER) benchmark datasets. Inspired by existing massively multilingual efforts for other core NLP tasks (e.g., Universal Dependencies), the project uses a general tagset and thorough annotation guidelines to collect standardized, cross-lingual annotations of named entity spans. The first installment (UNER v1) was released in 2024, and the project has continued and expanded since then, with various organizers, annotators, and collaborators in an active community.

View on arXiv PDF

Similar