Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
This work addresses the challenge of automating feedback generation to assist teachers and enhance student learning in language education, though it is incremental as it focuses on the initial indicator extraction phase.
This study tackled the problem of extracting feedback indicators from student submissions in language learning by using the Llama 3.1 large language model, and found statistically significant strong correlations between LLM-generated indicators and human ratings across various criteria.
Automated feedback generation has the potential to enhance students' learning progress by providing timely and targeted feedback. Moreover, it can assist teachers in optimizing their time, allowing them to focus on more strategic and personalized aspects of teaching. To generate high-quality, information-rich formative feedback, it is essential first to extract relevant indicators, as these serve as the foundation upon which the feedback is constructed. Teachers often employ feedback criteria grids composed of various indicators that they evaluate systematically. This study examines the initial phase of extracting such indicators from students' submissions of a language learning course using the large language model Llama 3.1. Accordingly, the alignment between indicators generated by the LLM and human ratings across various feedback criteria is investigated. The findings demonstrate statistically significant strong correlations, even in cases involving unanticipated combinations of indicators and criteria. The methodology employed in this paper offers a promising foundation for extracting indicators from students' submissions using LLMs. Such indicators can potentially be utilized to auto-generate explainable and transparent formative feedback in future research.