CLMar 13, 2021

ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation

Endri Kacupaj, Barshana Banerjee, Kuldeep Singh, Jens Lehmann

arXiv:2103.07771v12.022 citationsHas Code

Originality Synthesis-oriented

AI Analysis

It addresses the lack of diverse answer verbalizations in conversational QA datasets, which is incremental for researchers in natural language processing and knowledge graph applications.

The paper introduces ParaQA, a dataset with 5000 question-answer pairs featuring multiple paraphrased responses for single-turn conversation over knowledge graphs, and demonstrates its advantage using metrics like BLEU and METEOR.

This paper presents ParaQA, a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG). The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation. The existing datasets for conversational question answering over KGs (single-turn/multi-turn) focus on question paraphrasing and provide only up to one answer verbalization. However, ParaQA contains 5000 question-answer pairs with a minimum of two and a maximum of eight unique paraphrased responses for each question. We complement the dataset with baseline models and illustrate the advantage of having multiple paraphrased answers through commonly used metrics such as BLEU and METEOR. The ParaQA dataset is publicly available on a persistent URI for broader usage and adaptation in the research community.

View on arXiv PDF Code

Similar