Towards Generating Automatic Anaphora Annotations
This work addresses the need for large annotated datasets in nuanced NLP tasks like anaphora resolution, but it appears incremental as it builds on existing methods without claiming major breakthroughs.
The paper tackles the problem of high costs for manual annotation in NLP by exploring two methods for automatically generating coreferential annotations: direct conversion from existing datasets and parsing with multilingual models, detailing current progress and challenges.
Training models that can perform well on various NLP tasks require large amounts of data, and this becomes more apparent with nuanced tasks such as anaphora and conference resolution. To combat the prohibitive costs of creating manual gold annotated data, this paper explores two methods to automatically create datasets with coreferential annotations; direct conversion from existing datasets, and parsing using multilingual models capable of handling new and unseen languages. The paper details the current progress on those two fronts, as well as the challenges the efforts currently face, and our approach to overcoming these challenges.