RuCoCo: a new Russian corpus with coreference annotation
This provides a valuable resource for NLP researchers working on Russian coreference resolution, though it is incremental as it applies existing annotation methods to a new language-specific dataset.
The authors tackled the lack of a large, high-quality coreference-annotated corpus for Russian by creating RuCoCo, a new corpus with one million words and 150,000 mentions, achieving high inter-annotator agreement and making it publicly available.
We present a new corpus with coreference annotation, Russian Coreference Corpus (RuCoCo). The goal of RuCoCo is to obtain a large number of annotated texts while maintaining high inter-annotator agreement. RuCoCo contains news texts in Russian, part of which were annotated from scratch, and for the rest the machine-generated annotations were refined by human annotators. The size of our corpus is one million words and around 150,000 mentions. We make the corpus publicly available.