CLCYLGJul 7, 2025

SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction

arXiv:2507.05129v217 citationsh-index: 10EMNLP
Originality Incremental advance
AI Analysis

This work addresses the cold-start issue in educational assessment for previously unseen questions, enabling more efficient and personalized learning, though it is incremental as it builds on existing IRT and simulation techniques.

The paper tackles the problem of predicting question difficulty in educational assessments without needing real student responses, by aligning simulated students with instructed ability using direct preference optimization and LLM-based scoring, achieving superior performance over existing methods on two real-world datasets.

Item (question) difficulties play a crucial role in educational assessments, enabling accurate and efficient assessment of student abilities and personalization to maximize learning outcomes. Traditionally, estimating item difficulties can be costly, requiring real students to respond to items, followed by fitting an item response theory (IRT) model to get difficulty estimates. This approach cannot be applied to the cold-start setting for previously unseen items either. In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. We achieve this alignment using direct preference optimization (DPO), where we form preference pairs based on how likely responses are under a ground-truth IRT model. We perform a simulation by generating thousands of responses, evaluating them with a large language model (LLM)-based scoring model, and fit the resulting data to an IRT model to obtain item difficulty estimates. Through extensive experiments on two real-world student response datasets, we show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes