Alexandra Neagu

22.4CYApr 24

$μ$Ed API: Towards a Shared API for Education Microservices

Maximillan Sölch, Alexandra Neagu, Marcus Messer et al.

Learning at scale often requires domain-specific automation such as assessment and feedback. An organization locked in to a general learning platform without these specialist automations limits its pedagogical offering. An ecosystem of interoperable, platform-agnostic microservices for domain-specific automation would solve this problem. To develop an effective ecosystem, a standard interface (API) for education microservices is required. We propose an initial specification for a standard, platform-independent API for educational microservices, $μ$Ed. The API integrates functionality from existing systems in use at four institutions, which are adopting the new API. The API is initially specified for automation of feedback, assessment, and educational chatbots, with further service types planned. The API specification provided here enables the development of an ecosystem of education microservices that will facilitate automation in more domains, to more users, providing a richer learning experience in a wide range of disciplines.

HCFeb 20

"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

Alexandra Neagu, Marcus Messer, Peter Johnson et al.

Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for classification of student questions. However, we identify clear limitations in both the ability to classify with schemas and the value of doing so: schemas are limited and thus struggle to accommodate the semantic richness of composite prompts, offering only partial understanding the wider risks and benefits of chatbot integration. In the future, we recommend an analysis approach that captures the nuanced, multi-turn nature of conversation, for example, by applying methods from conversation analysis in discursive psychology.

Alexandra Neagu

2 Papers