CLOct 23, 2020

UNER: Universal Named-Entity RecognitionFramework

Diego Alves, Tin Kuculo, Gabriel Amaral, Gaurish Thakkar, Marko Tadic

arXiv:2010.12406v10.87 citations

Originality Synthesis-oriented

AI Analysis

This addresses the need for standardized NER across languages, but it is incremental as it builds on existing tools and datasets.

The authors tackled the problem of multilingual named-entity recognition by introducing the UNER framework and creating the first multilingual UNER corpus using the SETimes parallel corpus, with results including automatic annotation propagation and plans for training NER tools.

We introduce the Universal Named-Entity Recognition (UNER)framework, a 4-level classification hierarchy, and the methodology that isbeing adopted to create the first multilingual UNER corpus: the SETimesparallel corpus annotated for named-entities. First, the English SETimescorpus will be annotated using existing tools and knowledge bases. Afterevaluating the resulting annotations through crowdsourcing campaigns,they will be propagated automatically to other languages within the SE-Times corpora. Finally, as an extrinsic evaluation, the UNER multilin-gual dataset will be used to train and test available NER tools. As part offuture research directions, we aim to increase the number of languages inthe UNER corpus and to investigate possible ways of integrating UNERwith available knowledge graphs to improve named-entity recognition.

View on arXiv PDF

Similar