How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy?
This addresses the problem of automated question generation for educators, but it is incremental as it builds on existing AI methods for educational tasks.
The study evaluated GPT-4 Turbo's ability to generate educational questions from textbooks in zero-shot mode, finding it effective at producing higher-order thinking questions, especially at the 'understanding' level, but with variations in quality across cognitive levels.
We evaluate the effectiveness of GPT-4 Turbo in generating educational questions from NCERT textbooks in zero-shot mode. Our study highlights GPT-4 Turbo's ability to generate questions that require higher-order thinking skills, especially at the "understanding" level according to Bloom's Revised Taxonomy. While we find a notable consistency between questions generated by GPT-4 Turbo and those assessed by humans in terms of complexity, there are occasional differences. Our evaluation also uncovers variations in how humans and machines evaluate question quality, with a trend inversely related to Bloom's Revised Taxonomy levels. These findings suggest that while GPT-4 Turbo is a promising tool for educational question generation, its efficacy varies across different cognitive levels, indicating a need for further refinement to fully meet educational standards.