CLAug 30, 2021

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

arXiv:2108.13112v2659 citationsHas Code
AI Analysis

This dataset addresses the need for comprehensive Russian NLP resources, enabling development of models for nested entities and document-level relations, though it is incremental as it builds on existing dataset concepts.

The authors introduced NEREL, a Russian dataset for named entity recognition and relation extraction, which is larger than existing ones with 56K annotated named entities and 39K annotated relations, and includes nested entities and multi-level relations.

In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes