Lutz Prechelt

SE
8papers
86citations
Novelty21%
AI Score37

8 Papers

96.1SEMay 10
Guidelines for Empirical Studies in Software Engineering involving Large Language Models

Sebastian Baltes, Florian Angermeir, Chetan Arora et al.

Large Language Models (LLMs) are widely used in software engineering (SE) research and practice, yet their non-determinism, opaque training data, and rapidly evolving models threaten the reproducibility and replicability of empirical studies. We address this challenge through a collaborative effort of 22 researchers, presenting a taxonomy of seven study types that organizes how LLMs are used in SE research, together with eight guidelines for designing and reporting such studies. Each guideline distinguishes requirements (must) from recommended practices (should) and is contextualized by the study types it applies to. Our guidelines recommend that researchers: (1) declare LLM usage and role; (2) report model versions, configurations, and customizations; (3) document the tool architecture beyond the model; (4) disclose prompts, their development, and interaction logs; (5) validate LLM outputs with humans; (6) include an open LLM as a baseline; (7) use suitable baselines, benchmarks, and metrics; and (8) articulate limitations and mitigations. We complement the guidelines with an applicability matrix mapping guidelines to study types and a reporting checklist for authors and reviewers. We maintain the study types and guidelines online as a living resource for the community to use and shape (llm-guidelines$.$org).

6.1SEApr 16
Managing Power Gaps as an Element of Pair Programming Skill: A Grounded Theory

Linus Ververs, Janina Berger, Lutz Prechelt

Background: In pair programming, Togetherness (the partners understand each other's mental state well) is a main success factor. Maintaining high Togetherness is an element of pair programming skill. Some sessions appear to go badly although Togetherness appears good. Objective: Understand under what circumstances this is possible. Method: Grounded Theory Methodology based on 21 recorded pair programming sessions with 22 developers from 5 German software companies and 6 interviews with different developers from 4 other German companies. Results: We explain how a Power Gap can make a session dysfunctional despite the presence of high Togetherness, how it comes into existence due to a Knowledge Gap and Hierarchical Behavior, why its consequences (Defensive Behavior and Disengaging Behavior) are problematic, and how it can be reduced or prevented by Equalizing Behavior. Conclusions: Pair programming practitioners can improve their pair programming skill by unlearning problematic behaviors related to Power Gaps and by learning to recognize Power Gaps and apply Equalizing Behavior.

SEFeb 12, 2021
Two Elements of Pair Programming Skill

Franz Zieris, Lutz Prechelt

Background: Pair programming (PP) can have many benefits in industry. Researchers and practitioners recognize that successful and productive PP involves some skill that might take time to learn and improve. Question: What are the elements of pair programming skill? Method: We perform qualitative analyses of industrial pair programming sessions following the Grounded Theory Methodology. We look for patterns of problematic behavior to conceptualize key elements of what 'good' and 'bad' pairs do differently. Results: Here, we report two elements of pair programming skill: Good pairs (1) manage to maintain their Togetherness and (2) keep an eye on their session's Expediency. We identify three problematic behavioral patterns that affect one or both of these elements: Getting Lost in the Weeds, Losing the Partner, and Drowning the Partner. Conclusion: Pair programming skill is separate from general software development skill. Years of PP experience are neither a prerequisite nor sufficient for successful pair programming.

SEFeb 8, 2020
PP-ind: Description of a Repository of Industrial Pair Programming Research Data

Franz Zieris, Lutz Prechelt

PP-ind is a repository of audio-video-recordings of industrial pair programming sessions. Since 2007, our research group has collected data in 13 companies. A total of 57 developers worked together (mostly in groups of two, but also three or four) in 67 sessions with a mean length of 1:35 hours. In this report, we describe how we collected the data and provide summaries and characterizations of the sessions.

SENov 25, 2019
Does ICSE Accept the Right Contributions?

Lutz Prechelt

Background: There is a constant discussion regarding whether the ICSE Technical Research track is accepting too many contributions of some type and too few of some other type. Questions: Are ICSE and the contributions it is seeing well aligned with what is important for bringing software engineering forward? Method: 26 expert interviews with senior members of the ICSE community, evaluated qualitatively and reported with many quotations. Results: About three quarters of the respondents are not generally happy with ICSE's alignment. Two specific complaints that recur frequently concern a) many low-relevance contributions making it into the program and b) several types of high-relevance contributions hardly seen in the ICSE program.

SENov 22, 2019
Four presumed gaps in the software engineering research community's knowledge

Lutz Prechelt

Background: The state of the art in software engineering consists of a myriad of contributions and the gaps between them; it is difficult to characterize. Questions: In order to help understanding the state of the art, can we identify gaps in our knowledge that are at a very general, widely relevant level? Which research directions do these gaps suggest? Method: 54 expert interviews with senior members of the ICSE community, evaluated qualitatively using elements of Grounded Theory Methodology. Results: Our understanding of complexity, of good-enoughness, and of developers' strengths is underdeveloped. Some other relevant factors' relevance is apparently not clear. Software engineering is not yet an evidence-based discipline. Conclusion: More software engineering research should concern itself with emergence phenomena, with how engineering tradeoffs are made, with the assumptions underlying research works, and with creating certain taxonomies. Such work would also allow software engineering to become more evidence-based.

SEJun 22, 2017
A Community's Perspective on the Status and Future of Peer Review in Software Engineering

Lutz Prechelt, Daniel Graziotin, Daniel Méndez Fernández

Context: Pre-publication peer review of scientific articles is considered a key element of the research process in software engineering, yet it is often perceived as not to work fully well. Objective: We aim at understanding the perceptions of and attitudes towards peer review of authors and reviewers at one of software engineering's most prestigious venues, the International Conference on Software Engineering (ICSE). Method: We invited 932 ICSE 2014/15/16 authors and reviewers to participate in a survey with 10 closed and 9 open questions. Results: We present a multitude of results, such as: Respondents perceive only one third of all reviews to be good, yet one third as useless or misleading; they propose double-blind or zero-blind reviewing regimes for improvement; they would like to see showable proofs of (good) reviewing work be introduced; attitude change trends are weak. Conclusion: The perception of the current state of software engineering peer review is fairly negative. Also, we found hardly any trend that suggests reviewing will improve by itself over time; the community will have to make explicit efforts. Fortunately, our (mostly senior) respondents appear more open for trying different peer reviewing regimes than we had expected.

SENov 25, 2013
Distributed-Pair Programming can work well and is not just Distributed Pair-Programming

Julia Schenk, Lutz Prechelt, Stephan Salinger

Background: Distributed Pair Programming can be performed via screensharing or via a distributed IDE. The latter offers the freedom of concurrent editing (which may be helpful or damaging) and has even more awareness deficits than screen sharing. Objective: Characterize how competent distributed pair programmers may handle this additional freedom and these additional awareness deficits and characterize the impacts on the pair programming process. Method: A revelatory case study, based on direct observation of a single, highly competent distributed pair of industrial software developers during a 3-day collaboration. We use recordings of these sessions and conceptualize the phenomena seen. Results: 1. Skilled pairs may bridge the awareness deficits without visible obstruction of the overall process. 2. Skilled pairs may use the additional editing freedom in a useful limited fashion, resulting in potentially better fluency of the process than local pair programming. Conclusion: When applied skillfully in an appropriate context, distributed-pair programming can (not will!) work at least as well as local pair programming.