Design and Challenges of Cloze-Style Reading Comprehension Tasks on Multiparty Dialogue
This work addresses challenges in evaluating reading comprehension models on multiparty dialogue, which is incremental but important for improving AI's understanding of daily conversations.
The paper identifies limitations in existing cloze-style reading comprehension tasks for multiparty dialogue, showing that chronological data splits reduce test accuracy from 72% to 34%, and proposes two new tasks for more comprehensive prediction of personal entities in conversations.
This paper analyzes challenges in cloze-style reading comprehension on multiparty dialogue and suggests two new tasks for more comprehensive predictions of personal entities in daily conversations. We first demonstrate that there are substantial limitations to the evaluation methods of previous work, namely that randomized assignment of samples to training and test data substantially decreases the complexity of cloze-style reading comprehension. According to our analysis, replacing the random data split with a chronological data split reduces test accuracy on previous single-variable passage completion task from 72\% to 34\%, that leaves much more room to improve. Our proposed tasks extend the previous single-variable passage completion task by replacing more character mentions with variables. Several deep learning models are developed to validate these three tasks. A thorough error analysis is provided to understand the challenges and guide the future direction of this research.