CLAIFeb 23, 2020

Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

arXiv:2002.09919v2811 citations
AI Analysis

This work addresses the problem of explainability in multi-hop QA systems for AI researchers, showing current models are incremental and lack human-like reasoning.

The paper investigates whether state-of-the-art multi-hop question answering models can correctly answer the underlying single-hop sub-questions, finding they fail on a large portion despite correctly answering the multi-hop questions, indicating reliance on partial clues rather than true reasoning.

Multi-hop question answering (QA) requires a model to retrieve and integrate information from different parts of a long text to answer a question. Humans answer this kind of complex questions via a divide-and-conquer approach. In this paper, we investigate whether top-performing models for multi-hop questions understand the underlying sub-questions like humans. We adopt a neural decomposition model to generate sub-questions for a multi-hop complex question, followed by extracting the corresponding sub-answers. We show that multiple state-of-the-art multi-hop QA models fail to correctly answer a large portion of sub-questions, although their corresponding multi-hop questions are correctly answered. This indicates that these models manage to answer the multi-hop questions using some partial clues, instead of truly understanding the reasoning paths. We also propose a new model which significantly improves the performance on answering the sub-questions. Our work takes a step forward towards building a more explainable multi-hop QA system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes