A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching Assistant
This addresses the need for empirical evaluation of AI-driven educational tools in real classrooms, but it is incremental as it focuses on assessing existing technology rather than introducing new methods.
The study deployed an LLM-based virtual teaching assistant in a real-world AI programming course with 477 graduate students, finding that it facilitated student interactions and identified engagement patterns through analysis of 3,869 interactions, though its practical impact and acceptance were assessed as uncertain.
Virtual Teaching Assistants (VTAs) powered by Large Language Models (LLMs) have the potential to enhance student learning by providing instant feedback and facilitating multi-turn interactions. However, empirical studies on their effectiveness and acceptance in real-world classrooms are limited, leaving their practical impact uncertain. In this study, we develop an LLM-based VTA and deploy it in an introductory AI programming course with 477 graduate students. To assess how student perceptions of the VTA's performance evolve over time, we conduct three rounds of comprehensive surveys at different stages of the course. Additionally, we analyze 3,869 student--VTA interaction pairs to identify common question types and engagement patterns. We then compare these interactions with traditional student--human instructor interactions to evaluate the VTA's role in the learning process. Through a large-scale empirical study and interaction analysis, we assess the feasibility of deploying VTAs in real-world classrooms and identify key challenges for broader adoption. Finally, we release the source code of our VTA system, fostering future advancements in AI-driven education: \texttt{https://github.com/sean0042/VTA}.