CY CL HCFeb 14, 2023

Learning gain differences between ChatGPT and human tutor generated algebra hints

arXiv:2302.06871v120.2145 citationsh-index: 35

Originality Incremental advance

AI Analysis

This addresses the potential for scaling educational content production in adaptive tutoring systems, but results indicate incremental progress as ChatGPT hints underperformed human ones.

The study compared learning gains from ChatGPT-generated algebra hints versus human tutor hints, finding that while both produced positive gains, only human hints showed statistically significant improvements, with human hints yielding substantially higher gains in both algebra topics.

Large Language Models (LLMs), such as ChatGPT, are quickly advancing AI to the frontiers of practical consumer use and leading industries to re-evaluate how they allocate resources for content production. Authoring of open educational resources and hint content within adaptive tutoring systems is labor intensive. Should LLMs like ChatGPT produce educational content on par with human-authored content, the implications would be significant for further scaling of computer tutoring system approaches. In this paper, we conduct the first learning gain evaluation of ChatGPT by comparing the efficacy of its hints with hints authored by human tutors with 77 participants across two algebra topic areas, Elementary Algebra and Intermediate Algebra. We find that 70% of hints produced by ChatGPT passed our manual quality checks and that both human and ChatGPT conditions produced positive learning gains. However, gains were only statistically significant for human tutor created hints. Learning gains from human-created hints were substantially and statistically significantly higher than ChatGPT hints in both topic areas, though ChatGPT participants in the Intermediate Algebra experiment were near ceiling and not even with the control at pre-test. We discuss the limitations of our study and suggest several future directions for the field. Problem and hint content used in the experiment is provided for replicability.

View on arXiv PDF

Similar