SEAIHCAug 15, 2023

Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

arXiv:2308.08572v134 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of integrating LLMs into programming education and assessments for instructors and students, though it is incremental as it applies existing models to new educational data.

The paper evaluated ChatGPT-3.5 and GPT-4 on 72 introductory Python tasks, finding high correctness rates of 94.4% to 95.8% and reliable generation of explanations and code.

This paper investigates the performance of the Large Language Models (LLMs) ChatGPT-3.5 and GPT-4 in solving introductory programming tasks. Based on the performance, implications for didactic scenarios and assessment formats utilizing LLMs are derived. For the analysis, 72 Python tasks for novice programmers were selected from the free site CodingBat. Full task descriptions were used as input to the LLMs, while the generated replies were evaluated using CodingBat's unit tests. In addition, the general availability of textual explanations and program code was analyzed. The results show high scores of 94.4 to 95.8% correct responses and reliable availability of textual explanations and program code, which opens new ways to incorporate LLMs into programming education and assessment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes