CL CY LGJul 7, 2025

SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction

Alexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew Lan

arXiv:2507.05129v218.817 citationsh-index: 10EMNLP

Originality Incremental advance

AI Analysis

This work addresses the cold-start issue in educational assessment for previously unseen questions, enabling more efficient and personalized learning, though it is incremental as it builds on existing IRT and simulation techniques.

The paper tackles the problem of predicting question difficulty in educational assessments without needing real student responses, by aligning simulated students with instructed ability using direct preference optimization and LLM-based scoring, achieving superior performance over existing methods on two real-world datasets.

Item (question) difficulties play a crucial role in educational assessments, enabling accurate and efficient assessment of student abilities and personalization to maximize learning outcomes. Traditionally, estimating item difficulties can be costly, requiring real students to respond to items, followed by fitting an item response theory (IRT) model to get difficulty estimates. This approach cannot be applied to the cold-start setting for previously unseen items either. In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. We achieve this alignment using direct preference optimization (DPO), where we form preference pairs based on how likely responses are under a ground-truth IRT model. We perform a simulation by generating thousands of responses, evaluating them with a large language model (LLM)-based scoring model, and fit the resulting data to an IRT model to obtain item difficulty estimates. Through extensive experiments on two real-world student response datasets, we show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.

View on arXiv PDF

Similar