CLJun 12, 2024

Analyzing Large Language Models for Classroom Discussion Assessment

arXiv:2406.08680v113 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of automating classroom discussion assessment for educators, but it is incremental as it analyzes existing LLMs rather than introducing new methods.

The study investigated how task formulation, context length, and few-shot examples affect the performance of two large language models in assessing classroom discussion quality, finding that these factors influence performance and that consistency relates to performance, recommending a balanced approach.

Automatically assessing classroom discussion quality is becoming increasingly feasible with the help of new NLP advancements such as large language models (LLMs). In this work, we examine how the assessment performance of 2 LLMs interacts with 3 factors that may affect performance: task formulation, context length, and few-shot examples. We also explore the computational efficiency and predictive consistency of the 2 LLMs. Our results suggest that the 3 aforementioned factors do affect the performance of the tested LLMs and there is a relation between consistency and performance. We recommend a LLM-based assessment approach that has a good balance in terms of predictive performance, computational efficiency, and consistency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes