HCAICLMLJul 19, 2017

Crowdsourcing Multiple Choice Science Questions

arXiv:1707.06209v11243 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating diverse and relevant educational content for science learning, though it is incremental as it builds on existing crowdsourcing and dataset methods.

The paper tackles the problem of generating high-quality, domain-targeted multiple choice science questions by crowdsourcing, resulting in the SciQ dataset of 13.7K questions that improve accuracy on real science exams when used as training data.

We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes