SEAINov 2, 2023

The Behavior of Large Language Models When Prompted to Generate Code Explanations

arXiv:2311.01490v28 citationsh-index: 7
AI Analysis

This addresses the problem of understanding LLM behavior for educational code explanations, but it is incremental as it focuses on specific factors and metrics without proposing new methods.

The paper investigates how Large Language Models generate explanations for code from introductory programming courses, finding that while explanations are generally correct and have consistent readability levels for Java and Python, they score lower on completeness, conciseness, and specificity.

This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) for code examples commonly encountered in introductory programming courses. Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt, the specific code examples under consideration, the programming language involved, the temperature parameter, and the version of the LLM. However, a consistent pattern emerges for Java and Python, where explanations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade and a consistent lexical density, indicating the proportion of meaningful words relative to the total explanation size. Additionally, the generated explanations consistently achieve high scores for correctness, but lower scores on three other metrics: completeness, conciseness, and specificity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes