HCMay 6
Building AI Companions that Prioritise Learning over PerformanceHassan Khosravi, Dragan Gasevic, Shazia Sadiq et al.
Large language models (LLMs) are rapidly transforming knowledge work by improving the quality and efficiency of tasks such as writing, coding, and data analysis. However, their growing use in education has exposed a learning-performance paradox: while they can enhance short-term task performance, they may also undermine genuine learning, including cognitive growth, knowledge transfer, and metacognitive development. This paper addresses the question of how artificial intelligence should be designed and used to support learning rather than merely improve immediate outputs. We introduce the concept of AI learning companions, defined as adaptive, pedagogically informed, LLM-powered agents designed for integration into learning environments. We propose a framework for their design built on three interrelated foundations: a pedagogical foundation focused on how students learn with AI, an adaptive foundation focused on how AI learns about students, and a responsible design foundation ensuring systems remain transparent, accountable, inclusive, and secure. The framework is illustrated through five case studies spanning diverse educational contexts, levels, and tool designs, revealing both the promise and current limitations of existing tools. We conclude that there is a necessary shift away from LLMs designed for task-oriented performance, and beyond simply prompting them to act as tutors, toward deliberately developed AI learning companions that are pedagogically sound, adapt to their learners, and foster durable understanding, metacognitive growth, and learner agency.
CLApr 15, 2023
Robust Educational Dialogue Act Classifiers with Low-Resource and Imbalanced DatasetsJionghao Lin, Wei Tan, Ngoc Dang Nguyen et al.
Dialogue acts (DAs) can represent conversational actions of tutors or students that take place during tutoring dialogues. Automating the identification of DAs in tutoring dialogues is significant to the design of dialogue-based intelligent tutoring systems. Many prior studies employ machine learning models to classify DAs in tutoring dialogues and invest much effort to optimize the classification accuracy by using limited amounts of training data (i.e., low-resource data scenario). However, beyond the classification accuracy, the robustness of the classifier is also important, which can reflect the capability of the classifier on learning the patterns from different class distributions. We note that many prior studies on classifying educational DAs employ cross entropy (CE) loss to optimize DA classifiers on low-resource data with imbalanced DA distribution. The DA classifiers in these studies tend to prioritize accuracy on the majority class at the expense of the minority class which might not be robust to the data with imbalanced ratios of different DA classes. To optimize the robustness of classifiers on imbalanced class distributions, we propose to optimize the performance of the DA classifier by maximizing the area under the ROC curve (AUC) score (i.e., AUC maximization). Through extensive experiments, our study provides evidence that (i) by maximizing AUC in the training process, the DA classifier achieves significant performance improvement compared to the CE approach under low-resource data, and (ii) AUC maximization approaches can improve the robustness of the DA classifier under different class imbalance ratios.
CLApr 12, 2023
Does Informativeness Matter? Active Learning for Educational Dialogue Act ClassificationWei Tan, Jionghao Lin, David Lang et al.
Dialogue Acts (DAs) can be used to explain what expert tutors do and what students know during the tutoring process. Most empirical studies adopt the random sampling method to obtain sentence samples for manual annotation of DAs, which are then used to train DA classifiers. However, these studies have paid little attention to sample informativeness, which can reflect the information quantity of the selected samples and inform the extent to which a classifier can learn patterns. Notably, the informativeness level may vary among the samples and the classifier might only need a small amount of low informative samples to learn the patterns. Random sampling may overlook sample informativeness, which consumes human labelling costs and contributes less to training the classifiers. As an alternative, researchers suggest employing statistical sampling methods of Active Learning (AL) to identify the informative samples for training the classifiers. However, the use of AL methods in educational DA classification tasks is under-explored. In this paper, we examine the informativeness of annotated sentence samples. Then, the study investigates how the AL methods can select informative samples to support DA classifiers in the AL sampling process. The results reveal that most annotated sentences present low informativeness in the training dataset and the patterns of these sentences can be easily captured by the DA classifier. We also demonstrate how AL methods can reduce the cost of manual annotation in the AL sampling process.
HCMar 22
When Machines Join the Moral Circle: The Persona Effect of Generative AI Agents in Collaborative ReasoningYueqiao Jin, Roberto Martinez-Maldonado, Wanruo Shi et al.
Generative AI is increasingly positioned as a peer in collaborative learning, yet its effects on ethical deliberation remain unclear. We report a between-subjects experiment with university students (N=217) who discussed an autonomous-vehicle dilemma in triads under three conditions: human-only control, supportive AI teammate, or contrarian AI teammate. Using moral foundations lexicons, argumentative coding from the augmentative knowledge construction framework, semantic trajectory modelling with BERTopic and dynamic time warping, and epistemic network analysis, we traced how AI personas reshape moral discourse. Supportive AIs increased grounded/qualified claims relative to control, consolidating integrative reasoning around care/fairness, while contrarian AIs modestly broadened moral framing and sustained value pluralism. Both AI conditions reduced thematic drift compared with human-only groups, indicating more stable topical focus. Post-discussion justification complexity was only weakly predicted by moral framing and reasoning quality, and shifts in final moral decisions were driven primarily by participants' initial stance rather than condition. Overall, AI teammates altered the process, the distribution and connection of moral frames and argument quality, more than the outcome of moral choice, highlighting the potential of generative AI agents as teammates for eliciting reflective, pluralistic moral reasoning in collaborative learning.
SIApr 22
Heterogeneous Interaction Network Analysis (HINA): A New Learning Analytics Approach for Modelling, Analyzing, and Visualizing Complex Interactions in Learning ProcessesShihui Feng, Baiyue He, Dragan Gasevic et al.
Existing learning analytics approaches, which often model learning processes as sequences of learner actions or homogeneous relationships, are limited in capturing the distributed, multi-faceted nature of interactions in contemporary learning environments. To address this, we propose Heterogeneous Interaction Network Analysis (HINA), a novel multi-level learning analytics framework for modeling complex learning processes across diverse entities (e.g., learners, behaviours, AI agents, and task designs). HINA integrates a set of original methods, including summative measures and a new non-parametric clustering technique, with established practices for statistical testing and interactive visualization to provide a flexible and powerful analytical toolkit. In this paper, we first detail the theoretical and mathematical foundations of HINA for individual, dyadic, and meso-level analysis. We then demonstrate HINA's utility through a case study on AI-mediated small-group collaborative learning, revealing students' interaction profiles with peers versus AI; distinct engagement patterns that emerge from these interactions; and specific types of learning behaviors (e.g., asking questions, planning) directed to AI versus peers. By transforming process data into Heterogeneous Interaction Networks (HINs), HINA introduces a new paradigm for modeling learning processes and provides the dedicated, multi-level analytical methods required to extract meaning from them. It thereby moves beyond a single process data type to quantify and visualize how different elements in a learning environment interact and co-influence each other, opening new avenues for understanding complex educational dynamics.
HCMar 14
Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal AnalyticsXinyu Li, Linxuan Zhao, Roberto Martinez-Maldonado et al.
This study examined whether a single ceiling-mounted camera could be used to capture fine-grained learning behaviours in co-located practical learning. In undergraduate nursing simulations, teachers first identified seven observable behaviour categories, which were then used to train a YOLO-based detector. Video data were collected from 52 sessions, and analyses focused on Scenario A because it produced greater behavioural variation than Scenario B. Annotation reliability was high (F1=0.933). On the held-out test set, the model achieved a precision of 0.789, a recall of 0.784, and an mAP@0.5 of 0.827. When only behaviour frequencies were compared, no robust differences were found between high- and low-performing groups. However, when behaviour labels were analysed together with spatial context, clear differences emerged in both task and collaboration performance. Higher-performing teams showed more patient interaction in the primary work area, whereas lower-performing teams showed more phone-related activity and more activity in secondary areas. These findings suggest that behavioural data are more informative when interpreted together with where they occur. Overall, the study shows that a single-camera computer vision approach can support the analysis of teamwork and task engagement in face-to-face practical learning without relying on wearable sensors.
CYDec 4, 2025
Uncovering Students' Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern MiningJiameng Wei, Dinh Dang, Kaixun Yang et al.
Assessment of medication history-taking has traditionally relied on human observation, limiting scalability and detailed performance data. While Generative AI (GenAI) platforms enable extensive data collection and learning analytics provide powerful methods for analyzing educational traces, these approaches remain largely underexplored in pharmacy clinical training. This study addresses this gap by applying learning analytics to understand how students develop clinical communication competencies with GenAI-powered virtual patients -- a crucial endeavor given the diversity of student cohorts, varying language backgrounds, and the limited opportunities for individualized feedback in traditional training settings. We analyzed 323 students' interaction logs across Australian and Malaysian institutions, comprising 50,871 coded utterances from 1,487 student-GenAI dialogues. Combining Epistemic Network Analysis to model inquiry co-occurrences with Sequential Pattern Mining to capture temporal sequences, we found that high performers demonstrated strategic deployment of information recognition behaviors. Specifically, high performers centered inquiry on recognizing clinically relevant information, integrating rapport-building and structural organization, while low performers remained in routine question-verification loops. Demographic factors including first-language background, prior pharmacy work experience, and institutional context, also shaped distinct inquiry patterns. These findings reveal inquiry patterns that may indicate clinical reasoning development in GenAI-assisted contexts, providing methodological insights for health professions education assessment and informing adaptive GenAI system design that supports diverse learning pathways.
CLJun 3, 2024Code
Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text ClassificationShiqi Liu, Sannyuya Liu, Lele Sha et al.
Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP), Learning Analytics, and Educational Data Mining. Recently, Large Language Models (LLMs), such as ChatGPT, have demonstrated remarkable performance in various NLP tasks. However, their comprehensive evaluation and improvement approaches in LEC tasks have not been thoroughly investigated. In this study, we propose the Annotation Guidelines-based Knowledge Augmentation (AGKA) approach to improve LLMs. AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples. Subsequently, we conduct a systematic evaluation benchmark of LEC, which includes six LEC datasets covering behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B. GPT 4.0 with AGKA few-shot outperforms full-shot fine-tuned models such as BERT and RoBERTa on simple binary classification datasets. However, GPT 4.0 lags in multi-class tasks that require a deep understanding of complex semantic information. Notably, Llama 3 70B with AGKA is a promising combination based on open-source LLM, because its performance is on par with closed-source GPT 4.0 with AGKA. In addition, LLMs struggle to distinguish between labels with similar names in multi-class classification.
HCApr 17, 2024
Large Language Models Meet User Interfaces: The Case of Provisioning FeedbackStanislav Pozdniakov, Jonathan Brazil, Solmaz Abdi et al.
Incorporating Generative AI (GenAI) and Large Language Models (LLMs) in education can enhance teaching efficiency and enrich student learning. Current LLM usage involves conversational user interfaces (CUIs) for tasks like generating materials or providing feedback. However, this presents challenges including the need for educator expertise in AI and CUIs, ethical concerns with high-stakes decisions, and privacy risks. CUIs also struggle with complex tasks. To address these, we propose transitioning from CUIs to user-friendly applications leveraging LLMs via API calls. We present a framework for ethically incorporating GenAI into educational tools and demonstrate its application in our tool, Feedback Copilot, which provides personalized feedback on student assignments. Our evaluation shows the effectiveness of this approach, with implications for GenAI researchers, educators, and technologists. This work charts a course for the future of GenAI in education.
HCApr 25
Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental ImpactKiyoshige Garces, Gloria Milena Fernandez-Nieto, Linxuan Zhao et al.
Research shows that dialogue, the interactive process through which participants articulate their thinking, plays a central role in constructing shared understanding, coordinating action, and shaping learning outcomes in teams. Analysing dialogue content has been central to advancing team learning theory and informing the design of computer-supported collaborative learning environments, yet this progress has depended on labour-intensive qualitative coding. LLMs offer new possibilities for automating and enhancing the dialogue layer within emerging multimodal learning analytics approaches, with recent studies showing that they can approximate human coding through few-shot prompting. However, prior work has focused on replicating human coding accuracy for research purposes, rather than addressing a more educationally consequential question: how can we design prompts that allow an LLM to label team dialogue accurately and fast enough to be useful in real settings, such as in-person healthcare simulations, where results must be returned quickly and computational cost and sustainability also matter? This paper investigates how prompt design and batching strategies can be optimised to balance coding accuracy, processing time, and environmental impact in team-based healthcare simulation debriefing. Using a dataset of 11,647 utterances coded across 6 dialogue constructs, we compared 4 prompt designs across varying batch sizes, evaluating coding performance, processing time, and energy consumption, as well as the trade-offs between these metrics. Results indicate that increasing batch size improves speed and reduces energy use, but negatively impacts coding performance. Beyond demonstrating the feasibility of LLM-based qualitative analysis, this study offers practical guidance for scaling dialogue analytics in contexts where timeliness, privacy, and sustainability are critical.
IRJun 18, 2024
PromptDSI: Prompt-based Rehearsal-free Continual Learning for Document RetrievalTuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang et al.
Differentiable Search Index (DSI) utilizes pre-trained language models to perform indexing and document retrieval via end-to-end learning without relying on external indexes. However, DSI requires full re-training to index new documents, causing significant computational inefficiencies. Continual learning (CL) offers a solution by enabling the model to incrementally update without full re-training. Existing CL solutions in document retrieval rely on memory buffers or generative models for rehearsal, which is infeasible when accessing previous training data is restricted due to privacy concerns. To this end, we introduce PromptDSI, a prompt-based, rehearsal-free continual learning approach for document retrieval. PromptDSI follows the Prompt-based Continual Learning (PCL) framework, using learnable prompts to efficiently index new documents without accessing previous documents or queries. To improve retrieval latency, we remove the initial forward pass of PCL, which otherwise greatly increases training and inference time, with a negligible trade-off in performance. Additionally, we introduce a novel topic-aware prompt pool that employs neural topic embeddings as fixed keys, eliminating the instability of prompt key optimization while maintaining competitive performance with existing PCL prompt pools. In a challenging rehearsal-free continual learning setup, we demonstrate that PromptDSI variants outperform rehearsal-based baselines, match the strong cache-based baseline in mitigating forgetting, and significantly improving retrieval performance on new corpora.
CYOct 15, 2019
A Multivariate Elo-based Learner Model for Adaptive Educational SystemsSolmaz Abdi, Hassan Khosravi, Shazia Sadiq et al.
The Elo rating system has been recognised as an effective method for modelling students and items within adaptive educational systems. The existing Elo-based models have the limiting assumption that items are only tagged with a single concept and are mainly studied in the context of adaptive testing systems. In this paper, we introduce a multivariate Elo-based learner model that is suitable for the domains where learning items can be tagged with multiple concepts, and investigate its fit in the context of adaptive learning. To evaluate the model, we first compare the predictive performance of the proposed model against the standard Elo-based model using synthetic and public data sets. Our results from this study indicate that our proposed model has superior predictive performance compared to the standard Elo-based model, but the difference is rather small. We then investigate the fit of the proposed multivariate Elo-based model by integrating it into an adaptive learning system which incorporates the principles of open learner models (OLMs). The results from this study suggest that the availability of additional parameters derived from multivariate Elo-based models have two further advantages: guiding adaptive behaviour for the system and providing additional insight for students and instructors.