MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling
For educational assessment researchers, this work provides a method to incorporate learner heterogeneity into difficulty prediction, offering interpretable insights into item difficulty.
The paper tackles MCQ difficulty prediction by modeling heterogeneous student misconceptions via data-driven cognitive profiling. Their persona-driven framework improves MSE from 0.367 to 0.274 and R2 from 0.525 to 0.686 over a recent baseline.
Predicting the difficulty of multiple-choice questions (MCQs) is important for effective assessment, yet current methods typically assume a unimodal student ability distribution, overlooking the heterogeneous nature of student misconceptions. We propose a persona-driven framework that replaces theoretical ability sampling with data-driven cognitive profiling. Using student interactions from the EEDI dataset, we identify behavioral personas via latent class analysis (LCA), then condition a large language model (LLM) to simulate response distributions for each persona. These signals are aggregated with topic context and fed into a Ridge Regression model to predict the item response theory (IRT) difficulty parameter. With five-fold cross-validation, our method improves over a recent baseline (MSE: 0.367 to 0.274; R2: 0.525 to 0.686). The discovered personas are interpretable and offer insights into why items are difficult, with potential applications to diagnostic assessment design.