CL LGJul 1, 2023

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

Gowtham Ramesh, Makesh Sreedhar, Junjie Hu

arXiv:2307.00335v126.2224 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of unfaithful reasoning paths in multi-hop QA for users needing interpretable and accurate answers, representing an incremental improvement over existing generative methods.

The paper tackles the problem of inaccurate passage identification in multi-hop question answering, which leads to incorrect reasoning paths, by proposing a single-sequence prediction method over a local reasoning graph that integrates graph structure to connect key entities across passages. The result shows significant improvements in answer exact-match/F1 scores and faithfulness on HotpotQA and achieves state-of-the-art on Musique with only up to a 4% parameter increase.

Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model)\footnote{Code/Models will be released at \url{https://github.com/gowtham1997/SeqGraph}} that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4\% increase in model parameters.

View on arXiv PDF Code

Similar