CLOct 24, 2023

MarkQA: A large scale KBQA dataset with numerical reasoning

Xiang Huang, Sitao Cheng, Yuheng Bao, Shanshan Huang, Yuzhong Qu

arXiv:2310.15517v221.4133 citationsh-index: 5Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in KBQA for complex numerical reasoning, which is incremental as it builds on existing KBQA tasks by adding numerical aspects.

The authors tackled the problem of question answering over knowledge bases with complex numerical reasoning by introducing a new task (NR-KBQA) and a large dataset called MarkQA, which includes step-by-step reasoning formats and shows that existing state-of-the-art methods face significant challenges on this dataset.

While question answering over knowledge bases (KBQA) has shown progress in addressing factoid questions, KBQA with numerical reasoning remains relatively unexplored. In this paper, we focus on the complex numerical reasoning in KBQA and propose a new task, NR-KBQA, which necessitates the ability to perform both multi-hop reasoning and numerical reasoning. We design a logic form in Python format called PyQL to represent the reasoning process of numerical reasoning questions. To facilitate the development of NR-KBQA, we present a large dataset called MarkQA, which is automatically constructed from a small set of seeds. Each question in MarkQA is equipped with its corresponding SPARQL query, alongside the step-by-step reasoning process in the QDMR format and PyQL program. Experimental results of some state-of-the-art QA methods on the MarkQA show that complex numerical reasoning in KBQA faces great challenges.

View on arXiv PDF Code

Similar