Coreference Reasoning in Machine Reading Comprehension
This work is significant for researchers and developers in natural language processing and machine reading comprehension, as it highlights a critical limitation in current evaluation methods for coreference reasoning and provides a path towards more robust model training and assessment.
This paper addresses the challenge of coreference reasoning in Machine Reading Comprehension (MRC) by demonstrating that existing MRC datasets do not accurately reflect the natural distribution and difficulties of coreference reasoning. The authors propose a new methodology for creating MRC datasets that better capture these challenges, and they show that state-of-the-art models still struggle with coreference phenomena on their newly created dataset. They also develop a method to improve coreference reasoning in state-of-the-art models by leveraging existing coreference resolution datasets for training.
Coreference resolution is essential for natural language understanding and has been long studied in NLP. In recent years, as the format of Question Answering (QA) became a standard for machine reading comprehension (MRC), there have been data collection efforts, e.g., Dasigi et al. (2019), that attempt to evaluate the ability of MRC models to reason about coreference. However, as we show, coreference reasoning in MRC is a greater challenge than earlier thought; MRC datasets do not reflect the natural distribution and, consequently, the challenges of coreference reasoning. Specifically, success on these datasets does not reflect a model's proficiency in coreference reasoning. We propose a methodology for creating MRC datasets that better reflect the challenges of coreference reasoning and use it to create a sample evaluation set. The results on our dataset show that state-of-the-art models still struggle with these phenomena. Furthermore, we develop an effective way to use naturally occurring coreference phenomena from existing coreference resolution datasets when training MRC models. This allows us to show an improvement in the coreference reasoning abilities of state-of-the-art models. The code and the resulting dataset are available at https://github.com/UKPLab/coref-reasoning-in-qa.