DLCLMar 23, 2023

DBLP-QuAD: A Question Answering Dataset over the DBLP Scholarly Knowledge Graph

arXiv:2303.13351v329 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers in natural language processing and knowledge graph communities to benchmark question answering systems, though it is incremental as it builds on existing datasets.

The authors tackled the lack of a large-scale question answering dataset for scholarly knowledge graphs by creating DBLP-QuAD, which includes 10,000 question-answer pairs with SPARQL queries over the DBLP knowledge graph, making it the largest such dataset.

In this work we create a question answering dataset over the DBLP scholarly knowledge graph (KG). DBLP is an on-line reference for bibliographic information on major computer science publications that indexes over 4.4 million publications published by more than 2.2 million authors. Our dataset consists of 10,000 question answer pairs with the corresponding SPARQL queries which can be executed over the DBLP KG to fetch the correct answer. DBLP-QuAD is the largest scholarly question answering dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes