CLApr 1, 2024

KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels

arXiv:2404.01140v22 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses the lack of literary coreference resources for Korean, benefiting NLP researchers and practitioners, though it is incremental as it extends existing dataset creation methods to a new domain.

The authors introduced KoCoNovel, a large annotated dataset for character coreference in Korean novels, comprising 178K tokens from 50 texts, and showed that it improves BERT-based model performance in literary coreference tasks compared to non-literary datasets.

In this paper, we present KoCoNovel, a novel character coreference dataset derived from Korean literary texts, complete with detailed annotation guidelines. Comprising 178K tokens from 50 modern and contemporary novels, KoCoNovel stands as one of the largest public coreference resolution corpora in Korean, and the first to be based on literary texts. KoCoNovel offers four distinct versions to accommodate a wide range of literary coreference analysis needs. These versions are designed to support perspectives of the omniscient author or readers, and to manage multiple entities as either separate or overlapping, thereby broadening its applicability. One of KoCoNovel's distinctive features is that 24% of all character mentions are single common nouns, lacking possessive markers or articles. This feature is particularly influenced by the nuances of Korean address term culture, which favors the use of terms denoting social relationships and kinship over personal names. In experiments with a BERT-based coreference model, we observe notable performance enhancements with KoCoNovel in character coreference tasks within literary texts, compared to a larger non-literary coreference dataset. Such findings underscore KoCoNovel's potential to significantly enhance coreference resolution models through the integration of Korean cultural and linguistic dynamics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes