68.4SEApr 13
Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research LandscapeBianca Trinkenreich, Fabio Calefato, Kelly Blincoe et al.
Context: Software engineering (SE) researchers increasingly study Generative AI (GenAI) while also incorporating it into their own research practices. Despite rapid adoption, there is limited empirical evidence on how GenAI is used in SE research and its implications for research practices and governance. Aims: We conduct a large-scale survey of 457 SE researchers publishing in top venues between 2023 and 2025. Method: Using quantitative and qualitative analyses, we examine who uses GenAI and why, where it is used across research activities, and how researchers perceive its benefits, opportunities, challenges, risks, and governance. Results: GenAI use is widespread, with many researchers reporting pressure to adopt and align their work with it. Usage is concentrated in writing and early-stage activities, while methodological and analytical tasks remain largely human-driven. Although productivity gains are widely perceived, concerns about trust, correctness, and regulatory uncertainty persist. Researchers highlight risks such as inaccuracies and bias, emphasize mitigation through human oversight and verification, and call for clearer governance, including guidance on responsible use and peer review. Conclusion: We provide a fine-grained, SE-specific characterization of GenAI use across research activities, along with taxonomies of GenAI use cases for research and peer review, opportunities, risks, mitigation strategies, and governance needs. These findings establish an empirical baseline for the responsible integration of GenAI into academic practice.
68.9SEMar 24
From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AIMargaret-Anne Storey
Generative AI is accelerating software development, but may quietly shift where the real risks lie. As AI generates code faster than teams can understand it, two under appreciated forms of debt accumulate: cognitive debt, the erosion of shared understanding across a team, and intent debt, the absence of externalized rationale that both developers and AI agents need to work safely with code. This article proposes a Triple Debt Model for reasoning about software health built around three interacting debt types: technical debt in code, cognitive debt in people, and intent debt in externalized knowledge. Cognitive debt concerns what people understand; intent debt concerns what is explicitly captured for humans and machines to use. We discuss how generative AI changes the relative importance of these debt types, how each can be diagnosed and mitigated, and surfaced points of debate for practitioners.
35.8SEMay 11
ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar CodeNorman Anderson, Tarek Alakmeh, Victoria Jackson et al.
A rapidly growing body of research is examining how LLMs influence developers when they code. To date, this research has tended to focus on productivity and code quality outcomes, rather than the underlying cognitive processes involved in programming. To address this gap, we report on the results of an exploratory laboratory study of ten advanced student developers (five with support from AI and five without) who had to make a non-trivial extension to a sizable software system. Leveraging Polya's four problem-solving phases and 25 inductively-generated codes detailing distinct problem-solving behaviors as the primary lenses, we examined: (1) how AI impacted the problem-solving approach the developers used to solve the programming task, and (2) how AI impacted their progress when they became stuck. For the analysis, we triangulated data across multiple sources (e.g., think-aloud, code changes, web searches, and LLM prompts). Unexpectedly, while developers in the AI group repeatedly turned to the AI tool to offload certain aspects of the process, all detailed problem-solving behaviors appeared in both groups. We also found that nine out of ten participants found themselves stuck in their work, but with key differences in how they became stuck and unstuck. We highlight seven distinct causes for being stuck and highlight how AI in some cases helped and in other cases hindered becoming unstuck.
SEFeb 12, 2025
Generative AI and Empirical Software Engineering: A Paradigm ShiftChristoph Treude, Margaret-Anne Storey
The adoption of large language models (LLMs) and autonomous agents in software engineering marks an enduring paradigm shift. These systems create new opportunities for tool design, workflow orchestration, and empirical observation, while fundamentally reshaping the roles of developers and the artifacts they produce. Although traditional empirical methods remain central to software engineering research, the rapid evolution of AI introduces new data modalities, alters causal assumptions, and challenges foundational constructs such as "developer", "artifact", and "interaction". As humans and AI agents increasingly co-create, the boundaries between social and technical actors blur, and the reproducibility of findings becomes contingent on model updates and prompt contexts. This vision paper examines how the integration of LLMs into software engineering disrupts established research paradigms. We discuss how it transforms the phenomena we study, the methods and theories we rely on, the data we analyze, and the threats to validity that arise in dynamic AI-mediated environments. Our aim is to help the empirical software engineering community adapt its questions, instruments, and validation standards to a future in which AI systems are not merely tools, but active collaborators shaping software engineering and its study.
SEJun 15, 2025
Get on the Train or be Left on the Station: Using LLMs for Software Engineering ResearchBianca Trinkenreich, Fabio Calefato, Geir Hanssen et al.
The adoption of Large Language Models (LLMs) is not only transforming software engineering (SE) practice but is also poised to fundamentally disrupt how research is conducted in the field. While perspectives on this transformation range from viewing LLMs as mere productivity tools to considering them revolutionary forces, we argue that the SE research community must proactively engage with and shape the integration of LLMs into research practices, emphasizing human agency in this transformation. As LLMs rapidly become integral to SE research - both as tools that support investigations and as subjects of study - a human-centric perspective is essential. Ensuring human oversight and interpretability is necessary for upholding scientific rigor, fostering ethical responsibility, and driving advancements in the field. Drawing from discussions at the 2nd Copenhagen Symposium on Human-Centered AI in SE, this position paper employs McLuhan's Tetrad of Media Laws to analyze the impact of LLMs on SE research. Through this theoretical lens, we examine how LLMs enhance research capabilities through accelerated ideation and automated processes, make some traditional research practices obsolete, retrieve valuable aspects of historical research approaches, and risk reversal effects when taken to extremes. Our analysis reveals opportunities for innovation and potential pitfalls that require careful consideration. We conclude with a call to action for the SE research community to proactively harness the benefits of LLMs while developing frameworks and guidelines to mitigate their risks, to ensure continued rigor and impact of research in an AI-augmented future.
SENov 8, 2021
How Developers and Managers Define and Trade Productivity for QualityMargaret-Anne Storey, Brian Houck, Thomas Zimmermann
In this paper, we present the findings from a survey study to investigate how developers and managers define and trade-off developer productivity and software quality (two related lenses into software development). We found that developers and managers, as cohorts, are not well aligned in their views of what it means to be productive (developers think of productivity in terms of activity, while more managers think of productivity in terms of performance). We also found that developers are not accurate at predicting their managers' views of productivity. In terms of quality, we found that individual developers and managers have quite varied views of what quality means to them, but as cohorts they are closely aligned in their different views, with the majority in both groups defining quality in terms of robustness. Over half of the developers and managers reported that quality can be traded for higher productivity and why this trade-off can be justified, while one third consider quality as a necessary part of productivity that cannot be traded. We also present a new descriptive framework for quality, TRUCE, that we synthesize from the survey responses. We call for more discussion between developers and managers about what they each consider as important software quality attributes, and to have open debate about how software quality relates to developer productivity and what trade-offs should or should not be made.
SEMar 7, 2021
Uncovering the Benefits and Challenges of Continuous Integration PracticesOmar Elazhary, Colin Werner, Ze Shi Li et al.
In 2006, Fowler and Foemmel defined ten core Continuous Integration (CI) practices that could increase the speed of software development feedback cycles and improve software quality. Since then, these practices have been widely adopted by industry and subsequent research has shown they improve software quality. However, there is poor understanding of how organizations implement these practices, of the benefits developers perceive they bring, and of the challenges developers and organizations experience in implementing them. In this paper, we discuss a multiple-case study of three small- to medium-sized companies using the recommended suite of ten CI practices. Using interviews and activity log mining, we learned that these practices are broadly implemented but how they are implemented varies depending on their perceived benefits, the context of the project, and the CI tools used by the organization. We also discovered that CI practices can create new constraints on the software process that hurt feedback cycle time. For researchers, we show that how CI is implemented varies, and thus studying CI (for example, using data mining) requires understanding these differences as important context for research studies. For practitioners, our findings reveal in-depth insights on the possible benefits and challenges from using the ten practices, and how project context matters.
SEFeb 13, 2021
ADEPT: A Socio-Technical Theory of Continuous IntegrationOmar Elazhary, Margaret-Anne Storey, Neil A. Ernst et al.
Continuous practices that rely on automation in the software development workflow have been widely adopted by industry for over a decade. Despite this widespread use, software development remains a primarily human-driven activity that is highly creative and collaborative. There has been extensive research on how continuous practices rely on automation and its impact on software quality and development velocity, but relatively little has been done to understand how automation impacts developer behavior and collaboration. In this paper, we introduce a socio-technical theory about continuous practices. The ADEPT theory combines constructs that include humans, processes, documentation, automation and the project environment, and describes propositions that relate these constructs. The theory was derived from phenomena observed in previous empirical studies. We show how the ADEPT theory can explain and describe existing continuous practices in software development, and how it can be used to generate new propositions for future studies to understand continuous practices and their impact on the social and technical aspects of software development.
SEJan 14, 2021
"How Was Your Weekend?" Software Development Teams Working From Home During COVID-19Courtney Miller, Paige Rodeghero, Margaret-Anne Storey et al.
The mass shift to working at home during the COVID-19 pandemic radically changed the way many software development teams collaborate and communicate. To investigate how team culture and team productivity may also have been affected, we conducted two surveys at a large software company. The first, an exploratory survey during the early months of the pandemic with 2,265 developer responses, revealed that many developers faced challenges reaching milestones and that their team productivity had changed. We also found through qualitative analysis that important team culture factors such as communication and social connection had been affected. For example, the simple phrase "How was your weekend?" had become a subtle way to show peer support. In our second survey, we conducted a quantitative analysis of the team cultural factors that emerged from our first survey to understand the prevalence of the reported changes. From 608 developer responses, we found that 74% of these respondents missed social interactions with colleagues and 51% reported a decrease in their communication ease with colleagues. We used data from the second survey to build a regression model to identify important team culture factors for modeling team productivity. We found that the ability to brainstorm with colleagues, difficulty communicating with colleagues, and satisfaction with interactions from social activities are important factors that are associated with how developers report their software development team's productivity. Our findings inform how managers and leaders in large software companies can support sustained team productivity during times of crisis and beyond.
SEOct 7, 2020
Empirical Standards for Software Engineering ResearchPaul Ralph, Nauman bin Ali, Sebastian Baltes et al.
Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.
SEAug 25, 2020
A Tale of Two Cities: Software Developers Working from Home During the COVID-19 PandemicDenae Ford, Margaret-Anne Storey, Thomas Zimmermann et al.
The COVID-19 pandemic has shaken the world to its core and has provoked an overnight exodus of developers that normally worked in an office setting to working from home. The magnitude of this shift and the factors that have accompanied this new unplanned work setting go beyond what the software engineering community has previously understood to be remote work. To find out how developers and their productivity were affected, we distributed two surveys (with a combined total of 3,634 responses that answered all required questions) -- weeks apart to understand the presence and prevalence of the benefits, challenges, and opportunities to improve this special circumstance of remote work. From our thematic qualitative analysis and statistical quantitative analysis, we find that there is a dichotomy of developer experiences influenced by many different factors (that for some are a benefit, while for others a challenge). For example, a benefit for some was being close to family members but for others having family members share their working space and interrupting their focus, was a challenge. Our surveys led to powerful narratives from respondents and revealed the scale at which these experiences exist to provide insights as to how the future of (pandemic) remote work can evolve.
SEMay 27, 2020
Code Duplication and Reuse in Jupyter NotebooksAndreas Koenzen, Neil Ernst, Margaret-Anne Storey
Duplicating one's own code makes it faster to write software. This expediency is particularly valuable for users of computational notebooks. Duplication allows notebook users to quickly test hypotheses and iterate over data. In this paper, we explore how much, how and from where code duplication occurs in computational notebooks, and identify potential barriers to code reuse. Previous work in the area of computational notebooks describes developers' motivations for reuse and duplication but does not show how much reuse occurs or which barriers they face when reusing code. To address this gap, we first analyzed GitHub repositories for code duplicates contained in a repository's Jupyter notebooks, and then conducted an observational user study of code reuse, where participants solved specific tasks using notebooks. Our findings reveal that repositories in our sample have a mean self-duplication rate of 7.6%. However, in our user study, few participants duplicated their own code, preferring to reuse code from online sources.
SEAug 6, 2019
Do as I Do, Not as I Say: Do Contribution Guidelines Match the GitHub Contribution Process?Omar Elazhary, Margaret-Anne Storey, Neil Ernst et al.
Developer contribution guidelines are used in social coding sites like GitHub to explain and shape the process a project expects contributors to follow. They set standards for all participants and "save time and hassle caused by improperly created pull requests or issues that have to be rejected and resubmitted" (GitHub). Yet, we lack a systematic understanding of the content of a typical contribution guideline, as well as the extent to which these guidelines are followed in practice. Additionally, understanding how guidelines may impact projects that use Continuous Integration as part of the contribution process is of particular interest. To address this knowledge gap, we conducted a mixed-methods study of 53 GitHub projects with explicit contribution guidelines and coded the guidelines to extract key themes. We then created a process model using GitHub activity data (e.g., commit, new issue, new pull request) to compare the actual activity with the prescribed contribution guidelines. We show that approximately 68% of these projects diverge significantly from the expected process.
SEMay 30, 2019
The Who, What, How of Software Engineering Research: A Socio-Technical FrameworkMargaret-Anne Storey, Neil A. Ernst, Courtney Williams et al.
Software engineering is a socio-technical endeavor, and while many of our contributions focus on technical aspects, human stakeholders such as software developers are directly affected by and can benefit from our research and tool innovations. In this paper, we question how much of our research addresses human and social issues, and explore how much we study human and social aspects in our research designs. To answer these questions, we developed a socio-technical research framework to capture the main beneficiary of a research study (the who), the main type of research contribution produced (the what), and the research strategies used in the study (how we methodologically approach delivering relevant results given the who and what of our studies). We used this Who-What-How framework to analyze 151 papers from two well-cited publishing venues---the main technical track at the International Conference on Software Engineering, and the Empirical Software Engineering Journal by Springer---to assess how much this published research explicitly considers human aspects. We find that although a majority of these papers claim the contained research should benefit human stakeholders, most focus on technical contributions without engaging humans in their studies. Although our analysis is scoped to two venues, our results suggest a need for more diversification and triangulation of research strategies. In particular, there is a need for strategies that aim at a deeper understanding of human and social aspects of software development practice to balance the design and evaluation of technical innovations. We recommend that the framework should be used in the design of future studies in order to nudge software engineering research towards explicitly including human and social concerns in their designs, and to improve the relevance of our research for human stakeholders.
SEApr 29, 2019
How software engineering research aligns with design science: A reviewEmelie Engström, Margaret-Anne Storey, Per Runeson et al.
Background: Assessing and communicating software engineering research can be challenging. Design science is recognized as an appropriate research paradigm for applied research but is seldom referred to in software engineering. Applying the design science lens to software engineering research may improve the assessment and communication of research contributions. Aim: The aim of this study is 1) to understand whether the design science lens helps summarize and assess software engineering research contributions, and 2) to characterize different types of design science contributions in the software engineering literature. Method: In previous research, we developed a visual abstract template, summarizing the core constructs of the design science paradigm. In this study, we use this template in a review of a set of 38 top software engineering publications to extract and analyze their design science contributions. Results: We identified five clusters of papers, classifying them according to their alignment with the design science paradigm. Conclusions: The design science lens helps emphasize the theoretical contribution of research output---in terms of technological rules---and reflect on the practical relevance, novelty, and rigor of the rules proposed by the research.
SEFeb 8, 2018
Gamification: a Game Changer for Managing Technical Debt? A Design StudyMatthieu Foucault, Xavier Blanc, Margaret-Anne Storey et al.
Context: Technical debt management is challenging for software engineers due to poor tool support and a lack of knowledge on how to prioritize technical debt repayment and prevention activities. Furthermore, when there is a large backlog of debt, developers often lack the motivation to address it. Objective: In this paper, we describe a design study to investigate how gamification can support Technical Debt Management in a large legacy software system of an industrial company. Our study leads to a novel tool (named Themis) that combines technical debt support, version control, and gamification features. In addition to gamification features, Themis provides suggestions for developers on where to focus their effort, and visualizations for managers to track technical debt activities. Method: We describe how Themis was refined and validated in an iterative deployment with the company, finally conducting a qualitative study to investigate how the features of Themis affect technical debt management behavior. We consider the impact on both developers and managers. Results: Our results show that it achieves increased developer motivation, and supports managers in monitoring and influencing developer behaviors. We show how our findings may be transferable to other contexts by proposing guidelines on how to apply gamification. Conclusions: With this case, gamification appears as a promising solution to help technical debt management, although it needs to be carefully designed and implemented to avoid its possible negative effects.
HCFeb 22, 2017
How Software Developers Mitigate Collaboration Friction with ChatbotsCarlene Lebeuf, Margaret-Anne Storey, Alexey Zagalsky
Modern software developers rely on an extensive set of social media tools and communication channels. The adoption of team communication platforms has led to the emergence of conversation-based tools and integrations, many of which are chatbots. Understanding how software developers manage their complex constellation of collaborators in conjunction with the practices and tools they use can bring valuable insights into socio-technical collaborative work in software development and other knowledge work domains. In this paper, we explore how chatbots can help reduce the friction points software developers face when working collaboratively. Using a socio-technical model for collaborative work, we identify three main areas for conflict: friction stemming from team interactions with each other, an individual's interactions with technology, and team interactions with technology. Finally, we provide a set of open questions for discussion within the research community.