CLMay 21, 2020

RuBQ: A Russian Dataset for Question Answering over Wikidata

arXiv:2005.10659v128 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for Russian NLP researchers, but it is incremental as it adapts existing dataset creation methods to a new language.

The authors tackled the lack of a Russian dataset for knowledge base question answering by creating RuBQ, which includes 1,500 Russian questions with translations, SPARQL queries, and verified answers.

The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes