CLOct 12, 2022

EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain

arXiv:2210.06104v127 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This provides a valuable resource for researchers in educational AI, though it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the lack of high-quality educational datasets by introducing EduQG, a dataset with 3,397 samples including multiple-choice questions, answers, distractors, and source documents, which shows distinguishable differences from existing datasets and can support research in question generation and related tasks.

We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines will be released to support further research in question generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes