SelQA: A New Benchmark for Selection-based Question Answering
This work addresses the need for diverse and challenging datasets in question answering research, though it is incremental as it builds on existing dataset creation methods.
The authors introduced SelQA, a new selection-based question answering dataset generated via crowdsourcing from Wikipedia topics, with an annotation scheme designed to reduce word co-occurrences between questions and answers. They provided baseline results for answer sentence selection and answer triggering tasks, offering a benchmark for future improvements.
This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.