CYAIMar 30

A Framework for Human-AI Q-Matrix Refinement: A NeuralCDM Evaluation

arXiv:2604.1639887.6h-index: 7
AI Analysis

For educational assessment and learning analytics, this framework reduces expert effort and subjectivity in Q-matrix construction while improving empirical validity.

The paper proposes a human-AI framework that uses LLMs to generate candidate Q-matrices and NeuralCDM to evaluate them, achieving better model fit (AUC 0.780 vs. 0.717) than expert baselines on a thermodynamics dataset, with local LLMs matching cloud APIs for privacy-preserving deployment.

Q-matrices are a cornerstone of theory-driven assessment and learning analytics, making item demands and students' underlying knowledge components and misconceptions explicit and actionable. However, Q-matrices are typically crafted by experts, making them time-consuming to build, prone to subjectivity, and difficult to validate empirically. We propose a framework for human-AI Q-matrix refinement in which large language models (LLMs) generate candidate Q-matrices using structured, misconception-aware prompting, and NeuralCDM provides an empirical evaluation layer to compare candidates based on how well they explain student response data. We apply the framework to a thermodynamics assessment dataset and benchmark locally deployed LLMs against cloud-served models. Results show that iteratively refined LLM-generated Q-matrices can exceed expert-baseline model fit (AUC 0.780 vs. 0.717), and that locally deployed models achieve comparable performance to cloud APIs, supporting privacy-preserving deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes