CLJun 10, 2022

RuCoCo: a new Russian corpus with coreference annotation

arXiv:2206.04925v14 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This provides a valuable resource for NLP researchers working on Russian coreference resolution, though it is incremental as it applies existing annotation methods to a new language-specific dataset.

The authors tackled the lack of a large, high-quality coreference-annotated corpus for Russian by creating RuCoCo, a new corpus with one million words and 150,000 mentions, achieving high inter-annotator agreement and making it publicly available.

We present a new corpus with coreference annotation, Russian Coreference Corpus (RuCoCo). The goal of RuCoCo is to obtain a large number of annotated texts while maintaining high inter-annotator agreement. RuCoCo contains news texts in Russian, part of which were annotated from scratch, and for the rest the machine-generated annotations were refined by human annotators. The size of our corpus is one million words and around 150,000 mentions. We make the corpus publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes