Johan Jeuring

h-index2

5papers

2citations

Novelty41%

AI Score43

Ranked #81,279 of 201,326 authors (top 40%)#5,073 in AI (top 36%)

5 Papers

CYJun 1

The Use of Computational Thinking Skills, Difficulties, and Strategies of Introductory Programming Students Solving Bebras Tasks

Enrico Benedetti, Isaac Alpizar-Chacon, Johan Jeuring

Computational thinking (CT) is regarded as a fundamental skill set everyone should learn. Identifying when and how CT skills are used is challenging but important to inform interventions supporting their development. Previous research has examined how students and experts apply CT skills when solving introductory computational problems. However, the extent to which higher education students in introductory programming courses do so in depth is underexplored. We address this gap by examining how those students apply CT skills when solving computational problems, the difficulties they encounter, and the strategies they employ. We collected plans and solutions to Bebras tasks (short problems introducing CS concepts and considered effective for eliciting CT skills) in an introductory programming course for non-CS majors. We gathered 241 submissions from 58 students across five tasks, along with post-task comments and reflections on strategies. We analyzed the data using descriptive statistics, applied an existing coding scheme to identify CT skills, and conducted thematic analysis to identify difficulties and strategies. Submissions varied in structure and level of detail. The most prevalent CT skills were algorithmic thinking, abstraction, and decomposition, while evaluation and generalization appeared much less frequently. CT skill presence was positively associated with correct answers. Students faced challenges in four areas, including understanding the tasks and making a plan, and reported various problem-solving strategies. Consolidating and extending prior research on CT skills and problem solving, our findings show that students in introductory programming apply CT skills but can struggle to solve problems systematically and explain their reasoning. Furthermore, Bebras tasks create opportunities for this population to engage CT skills and could be used in future research.

CLSep 22, 2025

Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues

Dongxu Lu, Johan Jeuring, Albert Gatt

Evaluating large language models (LLMs) in long-form, knowledge-grounded role-play dialogues remains challenging. This study compares LLM-generated and human-authored responses in multi-turn professional training simulations through human evaluation ($N=38$) and automated LLM-as-a-judge assessment. Human evaluation revealed significant degradation in LLM-generated response quality across turns, particularly in naturalness, context maintenance and overall quality, while human-authored responses progressively improved. In line with this finding, participants also indicated a consistent preference for human-authored dialogue. These human judgements were validated by our automated LLM-as-a-judge evaluation, where Gemini 2.0 Flash achieved strong alignment with human evaluators on both zero-shot pairwise preference and stochastic 6-shot construct ratings, confirming the widening quality gap between LLM and human responses over time. Our work contributes a multi-turn benchmark exposing LLM degradation in knowledge-grounded role-play dialogues and provides a validated hybrid evaluation framework to guide the reliable integration of LLMs in training simulations.

AIJul 18, 2025

Combining model tracing and constraint-based modeling for multistep strategy diagnoses

Gerben van der Hoek, Johan Jeuring, Rogier Bos

Model tracing and constraint-based modeling are two approaches to diagnose student input in stepwise tasks. Model tracing supports identifying consecutive problem-solving steps taken by a student, whereas constraint-based modeling supports student input diagnosis even when several steps are combined into one step. We propose an approach that merges both paradigms. By defining constraints as properties that a student input has in common with a step of a strategy, it is possible to provide a diagnosis when a student deviates from a strategy even when the student combines several steps. In this study we explore the design of a system for multistep strategy diagnoses, and evaluate these diagnoses. As a proof of concept, we generate diagnoses for an existing dataset containing steps students take when solving quadratic equations (n=2136). To compare with human diagnoses, two teachers coded a random sample of deviations (n=70) and applications of the strategy (n=70). Results show that that the system diagnosis aligned with the teacher coding in all of the 140 student steps.

AIJul 18, 2025

Buggy rule diagnosis for combined steps through final answer evaluation in stepwise tasks

Gerben van der Hoek, Johan Jeuring, Rogier Bos

Many intelligent tutoring systems can support a student in solving a stepwise task. When a student combines several steps in one step, the number of possible paths connecting consecutive inputs may be very large. This combinatorial explosion makes error diagnosis hard. Using a final answer to diagnose a combination of steps can mitigate the combinatorial explosion, because there are generally fewer possible (erroneous) final answers than (erroneous) solution paths. An intermediate input for a task can be diagnosed by automatically completing it according to the task solution strategy and diagnosing this solution. This study explores the potential of automated error diagnosis based on a final answer. We investigate the design of a service that provides a buggy rule diagnosis when a student combines several steps. To validate the approach, we apply the service to an existing dataset (n=1939) of unique student steps when solving quadratic equations, which could not be diagnosed by a buggy rule service that tries to connect consecutive inputs with a single rule. Results show that final answer evaluation can diagnose 29,4% of these steps. Moreover, a comparison of the generated diagnoses with teacher diagnoses on a subset (n=115) shows that the diagnoses align in 97% of the cases. These results can be considered a basis for further exploration of the approach.

HCNov 15, 2020

Model-Driven Synthesis for Programming Tutors

Niek Mulleners, Johan Jeuring, Bastiaan Heeren

When giving automated feedback to a student working on a beginner's exercise, many programming tutors run into a completeness problem. On the one hand, we want a student to experiment freely. On the other hand, we want a student to write her program in such a way that we can provide constructive feedback. We propose to investigate how we can overcome this problem by using program synthesis, which we use to generate correct solutions that closely match a student program, and give feedback based on the results.