HCAIJul 5, 2025

Evaluating the Effectiveness of Large Language Models in Solving Simple Programming Tasks: A User-Centered Study

arXiv:2507.04043v11 citationsBIBM
Originality Incremental advance
AI Analysis

This research addresses the problem of optimizing LLM interactions for novice programmers in educational tools, though it is incremental as it builds on existing user-centered design concepts.

The study investigated how different interaction styles (passive, proactive, collaborative) with ChatGPT-4o affect user performance on simple programming tasks, finding that the collaborative style significantly improved task completion time and increased user satisfaction and perceived helpfulness.

As large language models (LLMs) become more common in educational tools and programming environments, questions arise about how these systems should interact with users. This study investigates how different interaction styles with ChatGPT-4o (passive, proactive, and collaborative) affect user performance on simple programming tasks. I conducted a within-subjects experiment where fifteen high school students participated, completing three problems under three distinct versions of the model. Each version was designed to represent a specific style of AI support: responding only when asked, offering suggestions automatically, or engaging the user in back-and-forth dialogue.Quantitative analysis revealed that the collaborative interaction style significantly improved task completion time compared to the passive and proactive conditions. Participants also reported higher satisfaction and perceived helpfulness when working with the collaborative version. These findings suggest that the way an LLM communicates, how it guides, prompts, and responds, can meaningfully impact learning and performance. This research highlights the importance of designing LLMs that go beyond functional correctness to support more interactive, adaptive, and user-centered experiences, especially for novice programmers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes