CLNov 15, 2023

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

CambridgeUW
arXiv:2311.09122v345 citationsh-index: 22
Originality Synthesis-oriented
AI Analysis

This addresses the need for standardized, high-quality multilingual NER datasets to facilitate research in natural language processing, though it is incremental as it builds on existing NER work by extending it to more languages.

The authors introduced Universal NER (UNER), a gold-standard multilingual benchmark for named entity recognition, comprising 18 datasets across 12 languages with cross-lingually consistent annotations, and provided initial modeling baselines for in-language and cross-lingual settings.

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes