CLAIJun 7, 2019

Compositional Questions Do Not Necessitate Multi-hop Reasoning

arXiv:1906.02900v11203 citations
Originality Synthesis-oriented
AI Analysis

This work highlights a potential flaw in multi-hop reasoning datasets, suggesting a need for better evaluation methods, but it is incremental as it critiques existing benchmarks without introducing a new solution.

The paper argues that many compositional questions in multi-hop reading comprehension datasets like HotpotQA can be answered with single-hop reasoning, as shown by a BERT-based model achieving 67 F1, comparable to multi-hop models, and human evaluations where participants answered over 80% of questions without full evidence.

Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs. We argue that it can be difficult to construct large multi-hop RC datasets. For example, even highly compositional questions can be answered with a single hop if they target specific entity types, or the facts needed to answer them are redundant. Our analysis is centered on HotpotQA, where we show that single-hop reasoning can solve much more of the dataset than previously thought. We introduce a single-hop BERT-based RC model that achieves 67 F1---comparable to state-of-the-art multi-hop models. We also design an evaluation setting where humans are not shown all of the necessary paragraphs for the intended multi-hop reasoning but can still answer over 80% of questions. Together with detailed error analysis, these results suggest there should be an increasing focus on the role of evidence in multi-hop reasoning and possibly even a shift towards information retrieval style evaluations with large and diverse evidence collections.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes