CLMay 9, 2025

Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students' (Mis)Understanding Is Hinted

Machi Shimmei, Masaki Uto, Yuichiroh Matsubayashi, Kentaro Inui, Aditi Mallavarapu, Noboru Matsuda

arXiv:2505.05815v24.91 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses the need for automated, high-quality assessment generation in education, though it is incremental as it builds on existing AI methods.

The study tackled the problem of generating valid multiple-choice questions (MCQs) by developing AnaQuest, a prompting technique that uses student responses to create correct and incorrect assertions, resulting in MCQs that more closely resembled human-crafted items in difficulty and discrimination compared to a baseline ChatGPT prompt.

The primary goal of this study is to develop and evaluate an innovative prompting technique, AnaQuest, for generating multiple-choice questions (MCQs) using a pre-trained large language model. In AnaQuest, the choice items are sentence-level assertions about complex concepts. The technique integrates formative and summative assessments. In the formative phase, students answer open-ended questions for target concepts in free text. For summative assessment, AnaQuest analyzes these responses to generate both correct and incorrect assertions. To evaluate the validity of the generated MCQs, Item Response Theory (IRT) was applied to compare item characteristics between MCQs generated by AnaQuest, a baseline ChatGPT prompt, and human-crafted items. An empirical study found that expert instructors rated MCQs generated by both AI models to be as valid as those created by human instructors. However, IRT-based analysis revealed that AnaQuest-generated questions - particularly those with incorrect assertions (foils) - more closely resembled human-crafted items in terms of difficulty and discrimination than those produced by ChatGPT.

View on arXiv PDF

Similar