CYAILGApr 13

MCQ Difficulty Prediction via Modeling Learner Heterogeneity Using Data-Driven Cognitive Profiling

arXiv:2605.1629086.7
AI Analysis

For educational assessment researchers, this work provides a method to incorporate learner heterogeneity into difficulty prediction, offering interpretable insights into item difficulty.

The paper tackles MCQ difficulty prediction by modeling heterogeneous student misconceptions via data-driven cognitive profiling. Their persona-driven framework improves MSE from 0.367 to 0.274 and R2 from 0.525 to 0.686 over a recent baseline.

Predicting the difficulty of multiple-choice questions (MCQs) is important for effective assessment, yet current methods typically assume a unimodal student ability distribution, overlooking the heterogeneous nature of student misconceptions. We propose a persona-driven framework that replaces theoretical ability sampling with data-driven cognitive profiling. Using student interactions from the EEDI dataset, we identify behavioral personas via latent class analysis (LCA), then condition a large language model (LLM) to simulate response distributions for each persona. These signals are aggregated with topic context and fed into a Ridge Regression model to predict the item response theory (IRT) difficulty parameter. With five-fold cross-validation, our method improves over a recent baseline (MSE: 0.367 to 0.274; R2: 0.525 to 0.686). The discovered personas are interpretable and offer insights into why items are difficult, with potential applications to diagnostic assessment design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes