CLSep 3, 2019

Introducing RONEC -- the Romanian Named Entity Corpus

arXiv:1909.01247v230 citationsHas Code
AI Analysis

This provides the first named entity corpus for Romanian, addressing a gap for NLP researchers and practitioners working with this language.

The authors tackled the lack of a dedicated named entity recognition corpus for Romanian by creating RONEC, which contains over 26,000 entities across 16 classes in approximately 5,000 annotated sentences.

We present RONEC - the Named Entity Corpus for the Romanian language. The corpus contains over 26000 entities in ~5000 annotated sentences, belonging to 16 distinct classes. The sentences have been extracted from a copy-right free newspaper, covering several styles. This corpus represents the first initiative in the Romanian language space specifically targeted for named entity recognition. It is available in BRAT and CoNLL-U Plus formats, and it is free to use and extend at github.com/dumitrescustefan/ronec .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes