CLFeb 1, 2023

AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference

Yuewei Yuan, Chaitanya Malaviya, Mark Yatskar

arXiv:2302.00762v228.4272 citationsh-index: 28Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the issue of model robustness in natural language processing for researchers and practitioners, but it is incremental as it builds on existing psycholinguistic studies and diagnostic methods.

The paper tackled the problem of whether modern coreference resolution models are sensitive to pronominal ambiguity, by constructing AmbiCoref, a diagnostic corpus of minimal sentence pairs, and found that humans are less sure of referents in ambiguous cases, while most models show little difference between ambiguous and unambiguous pairs.

Given a sentence "Abby told Brittney that she upset Courtney", one would struggle to understand who "she" refers to, and ask for clarification. However, if the word "upset" were replaced with "hugged", "she" unambiguously refers to Abby. We study if modern coreference resolution models are sensitive to such pronominal ambiguity. To this end, we construct AmbiCoref, a diagnostic corpus of minimal sentence pairs with ambiguous and unambiguous referents. Our examples generalize psycholinguistic studies of human perception of ambiguity around particular arrangements of verbs and their arguments. Analysis shows that (1) humans are less sure of referents in ambiguous AmbiCoref examples than unambiguous ones, and (2) most coreference models show little difference in output between ambiguous and unambiguous pairs. We release AmbiCoref as a diagnostic corpus for testing whether models treat ambiguity similarly to humans.

View on arXiv PDF Code

Similar