CL AIJun 7, 2019

Compositional Questions Do Not Necessitate Multi-hop Reasoning

Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer

arXiv:1906.02900v132.31203 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work highlights a potential flaw in multi-hop reasoning datasets, suggesting a need for better evaluation methods, but it is incremental as it critiques existing benchmarks without introducing a new solution.

The paper argues that many compositional questions in multi-hop reading comprehension datasets like HotpotQA can be answered with single-hop reasoning, as shown by a BERT-based model achieving 67 F1, comparable to multi-hop models, and human evaluations where participants answered over 80% of questions without full evidence.

Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs. We argue that it can be difficult to construct large multi-hop RC datasets. For example, even highly compositional questions can be answered with a single hop if they target specific entity types, or the facts needed to answer them are redundant. Our analysis is centered on HotpotQA, where we show that single-hop reasoning can solve much more of the dataset than previously thought. We introduce a single-hop BERT-based RC model that achieves 67 F1---comparable to state-of-the-art multi-hop models. We also design an evaluation setting where humans are not shown all of the necessary paragraphs for the intended multi-hop reasoning but can still answer over 80% of questions. Together with detailed error analysis, these results suggest there should be an increasing focus on the role of evidence in multi-hop reasoning and possibly even a shift towards information retrieval style evaluations with large and diverse evidence collections.

View on arXiv PDF Code

Similar