Narrative Question Answering with Cutting-Edge Open-Domain QA Techniques: A Comprehensive Study
This work addresses the difficulty of Book QA for researchers and practitioners, providing a comprehensive analysis and incremental improvements.
This study tackled the problem of question answering over book stories (Book QA), which lags behind open-domain QA, by benchmarking the NarrativeQA dataset with cutting-edge techniques and achieving a ~7% absolute improvement on Rouge-L. It also analyzed challenges through human studies, finding that event-centric questions dominate and expose model limitations in event-oriented scenarios.
Recent advancements in open-domain question answering (ODQA), i.e., finding answers from large open-domain corpus like Wikipedia, have led to human-level performance on many datasets. However, progress in QA over book stories (Book QA) lags behind despite its similar task formulation to ODQA. This work provides a comprehensive and quantitative analysis about the difficulty of Book QA: (1) We benchmark the research on the NarrativeQA dataset with extensive experiments with cutting-edge ODQA techniques. This quantifies the challenges Book QA poses, as well as advances the published state-of-the-art with a $\sim$7\% absolute improvement on Rouge-L. (2) We further analyze the detailed challenges in Book QA through human studies.\footnote{\url{https://github.com/gorov/BookQA}.} Our findings indicate that the event-centric questions dominate this task, which exemplifies the inability of existing QA models to handle event-oriented scenarios.