CY AI SEDec 9, 2024

Can LLMs Identify Gaps and Misconceptions in Students' Code Explanations?

Priti Oli, Rabin Banjade, Andrew M. Olney, Vasile Rus

arXiv:2501.10365v13.35 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating assessment of student-generated responses in code comprehension, though it is incremental in applying existing LLM methods to a specific educational domain.

The paper tackled the problem of automatically identifying gaps and misconceptions in students' self-explanations of code examples, finding that fine-tuned large language models, especially with preference optimization, outperformed zero-shot and few-shot prompting techniques.

This paper investigates various approaches using Large Language Models (LLMs) to identify gaps and misconceptions in students' self-explanations of specific instructional material, in our case explanations of code examples. This research is a part of our larger effort to automate the assessment of students' freely generated responses, focusing specifically on their self-explanations of code examples during activities related to code comprehension. In this work, we experiment with zero-shot prompting, Supervised Fine-Tuning (SFT), and preference alignment of LLMs to identify gaps in students' self-explanation. With simple prompting, GPT-4 consistently outperformed LLaMA3 and Mistral in identifying gaps and misconceptions, as confirmed by human evaluations. Additionally, our results suggest that fine-tuned large language models are more effective at identifying gaps in students' explanations compared to zero-shot and few-shot prompting techniques. Furthermore, our findings show that the preference optimization approach using Odds Ratio Preference Optimization (ORPO) outperforms SFT in identifying gaps and misconceptions in students' code explanations.

View on arXiv PDF

Similar