CLJan 31, 2024

[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs

arXiv:2401.17922v126.7105 citationsh-index: 14LATECHCLFL

Originality Incremental advance

AI Analysis

This addresses the challenge of building high-quality coreference systems for fiction in computational literary studies, representing a novel method for a known bottleneck.

The paper tackled the problem of coreference annotation and resolution in literary fiction, which is difficult due to complex structured outputs and subtle inferences, by developing and evaluating new language-model-based seq2seq systems that directly generate annotated text, resulting in the release of several trained models and a training workflow.

Coreference annotation and resolution is a vital component of computational literary studies. However, it has previously been difficult to build high quality systems for fiction. Coreference requires complicated structured outputs, and literary text involves subtle inferences and highly varied language. New language-model-based seq2seq systems present the opportunity to solve both these problems by learning to directly generate a copy of an input sentence with markdown-like annotations. We create, evaluate, and release several trained models for coreference, as well as a workflow for training new models.

View on arXiv PDF

Similar