AI CLApr 19, 2025

Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment

Antoun Yaacoub, Jérôme Da-Rugna, Zainab Assaghir

arXiv:2504.14232v213.615 citationsh-index: 9Int J Comput Theory Eng

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automating educational assessment for educators and learners, but it is incremental as it applies existing methods to a specific domain.

This study evaluated whether incorporating Bloom's Taxonomy into an AI tool for generating multiple-choice questions improves alignment with cognitive objectives, finding that a Transformer-based model (DistilBERT) achieved 91% accuracy in classifying questions by cognitive level.

This study evaluates the integration of Bloom's Taxonomy into OneClickQuiz, an Artificial Intelligence (AI) driven plugin for automating Multiple-Choice Question (MCQ) generation in Moodle. Bloom's Taxonomy provides a structured framework for categorizing educational objectives into hierarchical cognitive levels. Our research investigates whether incorporating this taxonomy can improve the alignment of AI-generated questions with specific cognitive objectives. We developed a dataset of 3691 questions categorized according to Bloom's levels and employed various classification models-Multinomial Logistic Regression, Naive Bayes, Linear Support Vector Classification (SVC), and a Transformer-based model (DistilBERT)-to evaluate their effectiveness in categorizing questions. Our results indicate that higher Bloom's levels generally correlate with increased question length, Flesch-Kincaid Grade Level (FKGL), and Lexical Density (LD), reflecting the increased complexity of higher cognitive demands. Multinomial Logistic Regression showed varying accuracy across Bloom's levels, performing best for "Knowledge" and less accurately for higher-order levels. Merging higher-level categories improved accuracy for complex cognitive tasks. Naive Bayes and Linear SVC also demonstrated effective classification for lower levels but struggled with higher-order tasks. DistilBERT achieved the highest performance, significantly improving classification of both lower and higher-order cognitive levels, achieving an overall validation accuracy of 91%. This study highlights the potential of integrating Bloom's Taxonomy into AI-driven assessment tools and underscores the advantages of advanced models like DistilBERT for enhancing educational content generation.

View on arXiv PDF

Similar