HCApr 25, 2025
From Prompts to Propositions: A Logic-Based Lens on Student-LLM InteractionsAli Alfageeh, Sadegh AlMahdi Kazemi Zarkouei, Daye Nam et al.
Background and Context. The increasing integration of large language models (LLMs) in computing education presents an emerging challenge in understanding how students use LLMs and craft prompts to solve computational tasks. Prior research has used both qualitative and quantitative methods to analyze prompting behavior, but these approaches lack scalability or fail to effectively capture the semantic evolution of prompts. Objective. In this paper, we investigate whether students prompts can be systematically analyzed using propositional logic constraints. We examine whether this approach can identify patterns in prompt evolution, detect struggling students, and provide insights into effective and ineffective strategies. Method. We introduce Prompt2Constraints, a novel method that translates students prompts into logical constraints. The constraints are able to represent the intent of the prompts in succinct and quantifiable ways. We used this approach to analyze a dataset of 1,872 prompts from 203 students solving introductory programming tasks. Findings. We find that while successful and unsuccessful attempts tend to use a similar number of constraints overall, when students fail, they often modify their prompts more significantly, shifting problem-solving strategies midway. We also identify points where specific interventions could be most helpful to students for refining their prompts. Implications. This work offers a new and scalable way to detect students who struggle in solving natural language programming tasks. This work could be extended to investigate more complex tasks and integrated into programming tools to provide real-time support.
SEJul 1, 2021
Comparing Example-Based Collaborative Reflection to Problem Solving Practice for Learning during Team-Based Software Engineering ProjectsSreecharan Sankaranarayanan, Siddharth Reddy Kandimalla, Christopher Bogart et al.
Contributing to the literature on aptitude-treatment interactions between worked examples and problem-solving, this paper addresses differential learning from the two approaches when students are positioned as domain experts learning new concepts. Our evaluation is situated in a team project that is part of an advanced software engineering course. In this course, students who possess foundational domain knowledge but are learning new concepts engage alternatively in programming followed by worked example-based reflection. They are either allowed to finish programming or are curtailed after a pre-specified time to participate in a longer worked example-based reflection. We find significant pre- to post-test learning gains in both conditions. Then, we not only find significantly more learning when students participated in longer worked example-based reflections but also a significant performance improvement on a problem-solving transfer task. These findings suggest that domain experts learning new concepts benefit more from worked example-based reflections than from problem-solving.
SEAug 3, 2020
Understanding and Improving Artifact Sharing in Software Engineering ResearchChristopher S. Timperley, Lauren Herckis, Claire Le Goues et al.
In recent years, many software engineering researchers have begun to include artifacts alongside their research papers. Ideally, artifacts, including tools, benchmarks, and data, support the dissemination of ideas, provide evidence for research claims, and serve as a starting point for future research. However, in practice, artifacts suffer from a variety of issues that prevent the realization of their full potential. To help the software engineering community realize the full potential of artifacts, we seek to understand the challenges involved in the creation, sharing, and use of artifacts. To that end, we perform a mixed-methods study including a survey of artifacts in software engineering publications, and an online survey of 153 software engineering researchers. By analyzing the perspectives of artifact creators, users, and reviewers, we identify several high-level challenges that affect the quality of artifacts including mismatched expectations between these groups, and a lack of sufficient reward for both creators and reviewers. Using Diffusion of Innovations as an analytical framework, we examine how these challenges relate to one another, and build an understanding of the factors that affect the sharing and success of artifacts. Finally, we make recommendations to improve the quality of artifacts based on our results and existing best practices.
SEMar 26, 2020
Empirical Study of Restarted and Flaky Builds on Travis CIThomas Durieux, Claire Le Goues, Michael Hilton et al.
Continuous Integration (CI) is a development practice where developers frequently integrate code into a common codebase. After the code is integrated, the CI server runs a test suite and other tools to produce a set of reports (e.g., output of linters and tests). If the result of a CI test run is unexpected, developers have the option to manually restart the build, re-running the same test suite on the same code; this can reveal build flakiness, if the restarted build outcome differs from the original build. In this study, we analyze restarted builds, flaky builds, and their impact on the development workflow. We observe that developers restart at least 1.72% of builds, amounting to 56,522 restarted builds in our Travis CI dataset. We observe that more mature and more complex projects are more likely to include restarted builds. The restarted builds are mostly builds that are initially failing due to a test, network problem, or a Travis CI limitations such as execution timeout. Finally, we observe that restarted builds have a major impact on development workflow. Indeed, in 54.42% of the restarted builds, the developers analyze and restart a build within an hour of the initial failure. This suggests that developers wait for CI results, interrupting their workflow to address the issue. Restarted builds also slow down the merging of pull requests by a factor of three, bringing median merging time from 16h to 48h.