CLOct 21, 2024

Findings of the Third Shared Task on Multilingual Coreference Resolution

Michal Novák, Barbora Dohnalová, Miloslav Konopík, Anna Nedoluzhko, Martin Popel, Ondřej Pražák, Jakub Sido, Milan Straka, Zdeněk Žabokrtský, Daniel Zeman

arXiv:2410.15949v215.224 citationsh-index: 28Has CodeCRAC

Originality Synthesis-oriented

AI Analysis

This is an incremental update to a shared task, advancing multilingual coreference resolution for NLP researchers by enhancing realism and linguistic diversity.

The paper describes the third edition of a shared task on multilingual coreference resolution, which increased complexity by removing gold slots for zero anaphora and expanded to include more languages, particularly historical ones, using data from CorefUD 1.2 across 21 datasets in 15 languages, with 6 systems competing.

The paper presents an overview of the third edition of the shared task on multilingual coreference resolution, held as part of the CRAC 2024 workshop. Similarly to the previous two editions, the participants were challenged to develop systems capable of identifying mentions and clustering them based on identity coreference. This year's edition took another step towards real-world application by not providing participants with gold slots for zero anaphora, increasing the task's complexity and realism. In addition, the shared task was expanded to include a more diverse set of languages, with a particular focus on historical languages. The training and evaluation data were drawn from version 1.2 of the multilingual collection of harmonized coreference resources CorefUD, encompassing 21 datasets across 15 languages. 6 systems competed in this shared task.

View on arXiv PDF Code

Similar