Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications
This work addresses the potential for AI to innovate in education and assessment, though it is incremental as it applies existing prompting techniques to new domains.
The study explored using prompted large language models (LLMs) for educational tasks like generating open-ended and multiple-choice questions from textbooks, explaining grammatical errors in Bengali, and assessing HR interview transcripts, finding they can perform comparably to human experts in some cases but with limitations.
In the era of generative artificial intelligence (AI), the fusion of large language models (LLMs) offers unprecedented opportunities for innovation in the field of modern education. We embark on an exploration of prompted LLMs within the context of educational and assessment applications to uncover their potential. Through a series of carefully crafted research questions, we investigate the effectiveness of prompt-based techniques in generating open-ended questions from school-level textbooks, assess their efficiency in generating open-ended questions from undergraduate-level technical textbooks, and explore the feasibility of employing a chain-of-thought inspired multi-stage prompting approach for language-agnostic multiple-choice question (MCQ) generation. Additionally, we evaluate the ability of prompted LLMs for language learning, exemplified through a case study in the low-resource Indian language Bengali, to explain Bengali grammatical errors. We also evaluate the potential of prompted LLMs to assess human resource (HR) spoken interview transcripts. By juxtaposing the capabilities of LLMs with those of human experts across various educational tasks and domains, our aim is to shed light on the potential and limitations of LLMs in reshaping educational practices.