CL AIAug 29, 2023

KGConv, a Conversational Corpus grounded in Wikidata

Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire Gardent

arXiv:2308.15298v116.881 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This work addresses the need for large-scale, structured conversational datasets for researchers in natural language processing and knowledge-based AI, though it is incremental as it builds on existing Wikidata resources.

The authors introduced KGConv, a large conversational corpus of 71k conversations grounded in Wikidata facts, with each conversation averaging 8.6 questions and providing multiple question variants per fact using various methods. They established baselines for Knowledge-Based, Conversational Question Generation and highlighted potential applications in other generation and analysis tasks.

We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.

View on arXiv PDF

Similar