CLMar 12, 2025

Towards Generating Automatic Anaphora Annotations

arXiv:2503.09417v22 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses the need for large annotated datasets in nuanced NLP tasks like anaphora resolution, but it appears incremental as it builds on existing methods without claiming major breakthroughs.

The paper tackles the problem of high costs for manual annotation in NLP by exploring two methods for automatically generating coreferential annotations: direct conversion from existing datasets and parsing with multilingual models, detailing current progress and challenges.

Training models that can perform well on various NLP tasks require large amounts of data, and this becomes more apparent with nuanced tasks such as anaphora and conference resolution. To combat the prohibitive costs of creating manual gold annotated data, this paper explores two methods to automatically create datasets with coreferential annotations; direct conversion from existing datasets, and parsing using multilingual models capable of handling new and unseen languages. The paper details the current progress on those two fronts, as well as the challenges the efforts currently face, and our approach to overcoming these challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes