SEJan 25
Political and Ideological Pressure in Software Engineering Research: The Case of DEI BacklashSonja M. Hyrynsalmi, Chris Brown, Alexander Serebrenik et al.
Political and ideological pressures shape global research. Recently, these pressures have become particularly visible in research related to diversity, equity, and inclusion (DEI). Drastic changes in national funding and governmental guidance, especially in the US, have affected the global software engineering research ecosystem. The impacts of these pressures on research are not always direct, as they operate at multiple levels. However, what is clear is that these pressures affect every field, including software engineering (SE), despite the belief that our field is politically and ideologically neutral. In this position paper, we examine cases of political and ideological pressures on the SE research ecosystem. We investigate the community's perceptions of political and ideological pressures by analyzing community survey responses and outlining case examples of DEI backlash in SE research across three levels: macro, meso, and micro. Our research shows how recent political and ideological pressures have affected SE research across these levels, and, as a result, we propose actionable steps for the community to address these issues at different levels.
SEMar 25, 2021Code
Quality Gatekeepers: Investigating the Effects ofCode Review Bots on Pull Request ActivitiesMairieli Wessel, Alexander Serebrenik, Igor Wiese et al.
Software bots have been facilitating several development activities in Open Source Software (OSS) projects, including code review. However, these bots may bring unexpected impacts to group dynamics, as frequently occurs with new technology adoption. Understanding and anticipating such effects is important for planning and management. To analyze these effects, we investigate how several activity indicators change after the adoption of a code review bot. We employed a regression discontinuity design on 1,194 software projects from GitHub. We also interviewed 12 practitioners, including open-source maintainers and contributors. Our results indicate that the adoption of code review bots increases the number of monthly merged pull requests, decreases monthly non-merged pull requests, and decreases communication among developers. From the developers' perspective, these effects are explained by the transparency and confidence the bot comments introduce, in addition to the changes in the discussion focused on pull requests. Practitioners and maintainers may leverage our results to understand, or even predict, bot effects on their projects.
SEJul 27, 2020Code
Work Practices and Perceptions from Women Core Developers in OSS CommunitiesEdna Dias Canedo, Rodrigo Bonifácio, Márcio Vinícius Okimoto et al.
The effect of gender diversity in open source communities has gained increasing attention from practitioners and researchers. For instance, organizations such as the Python Software Foundation and the OpenStack Foundation started actions to increase gender diversity and promote women to top positions in the communities. Although the general underrepresentation of women (a.k.a. horizontal segregation) in open source communities has been explored in a number of research studies, little is known about the vertical segregation in open source communities -- which occurs when there are fewer women in high-level positions. To address this research gap, in this paper we present the results of a mixed-methods study on gender diversity and work practices of core developers contributing to open-source communities. In the first study, we used mining-software repositories procedures to identify the core developers of 711 open source projects, in order to understand how common are women core developers in open source communities and characterize their work practices. In the second study, we surveyed the women core developers we identified in the first study to collect their perceptions of gender diversity and gender bias they might have observed while contributing to open source systems. Our findings show that open source communities present both horizontal and vertical segregation (only 2.3% of the core developers are women). Nevertheless, differently from previous studies, most of the women core developers (65.7%) report never having experienced gender discrimination when contributing to an open source project. Finally, we did not note substantial differences between the work practices among women and men core developers. We reflect on these findings and present some ideas that might increase the participation of women in open source communities.
SEJun 19, 2019Code
On the abandonment and survival of open source projects: An empirical investigationGuilherme Avelino, Eleni Constantinou, Marco Tulio Valente et al.
Background: Evolution of open source projects frequently depends on a small number of core developers. The loss of such core developers might be detrimental for projects and even threaten their entire continuation. However, it is possible that new core developers assume the project maintenance and allow the project to survive. Aims: The objective of this paper is to provide empirical evidence on: 1) the frequency of project abandonment and survival, 2) the differences between abandoned and surviving projects, and 3) the motivation and difficulties faced when assuming an abandoned project. Method: We adopt a mixed-methods approach to investigate project abandonment and survival. We carefully select 1,932 popular GitHub projects and recover the abandoned and surviving projects, and conduct a survey with developers that have been instrumental in the survival of the projects. Results: We found that 315 projects (16%) were abandoned and 128 of these projects (41%) survived because of new core developers who assumed the project development. The survey indicates that (i) in most cases the new maintainers were aware of the project abandonment risks when they started to contribute; (ii) their own usage of the systems is the main motivation to contribute to such projects; (iii) human and social factors played a key role when making these contributions; and (iv) lack of time and the difficulty to obtain push access to the repositories are the main barriers faced by them. Conclusions: Project abandonment is a reality even in large open source projects and our work enables a better understanding of such risks, as well as highlights ways in avoiding them.
SEDec 6, 2015Code
Continuous integration in a social-coding world: Empirical evidence from GitHub. **Updated version with corrections**Bogdan Vasilescu, Stef van Schuylenburg, Jules Wulms et al.
Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day. With the advent of GitHub, a platform well known for its "social coding" features that aid collaboration and sharing, and currently the largest code host in the open source world, collaborative software development has never been more prominent. In GitHub development one can distinguish between two types of developer contributions to a project: direct ones, coming from a typically small group of developers with write access to the main project repository, and indirect ones, coming from developers who fork the main repository, update their copies locally, and submit pull requests for review and merger. In this paper we explore how GitHub developers use continuous integration as well as whether the contribution type (direct versus indirect) and different project characteristics (e.g., main programming language, or project age) are associated with the success of the automatic builds.
SEJan 23
Ethics of Care for Software EngineeringAlexander Serebrenik, Sebastian Baltes
Software engineering researchers repeatedly argue that the impact of their research on industrial practice, while desired and intended, is rarely achieved. We believe that a possible explanation of this phenomenon is the opposition of "caring about" and "caring for", based on the ethics of care. Indeed, while software engineering is collaborative and hence builds on interpersonal relations, researchers tend to care about "industrial impact" and "practitioners" in abstract terms, but rarely care for specific individuals working in specific contexts facing specific challenges. In this position paper, we advocate for the adoption of ethics of care in software engineering and discuss the implications of this adoption for researchers and conference organizers.
SEMar 31
HackRep: A Large-Scale Dataset of GitHub Hackathon ProjectsSjoerd Halmans, Lavinia Paganini, Alexander Serebrenik et al.
Hackathons are time-bound collaborative events that often target software creation. Although hackathons have been studied in the past, existing work focused on in-depth case studies limiting our understanding of hackathons as a software engineering activity. To complement the existing body of knowledge, we introduce HackRep, a dataset of 100,356 hackathon GitHub repositories. We illustrate the ways HackRep can benefit software engineering researchers by presenting a preliminary investigation of hackathon project continuation, hackathon team composition, and an estimation of hackathon geography. We further display the opportunities of using this dataset, for instance showing the possibility of estimating hackathon durations based on commit timestamps.
SENov 12, 2020
A Fine-grained Data Set and Analysis of Tangling in Bug Fixing CommitsSteffen Herbold, Alexander Trautsch, Benjamin Ledel et al.
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.
SEOct 20, 2020
Assessment of Off-the-Shelf SE-specific Sentiment Analysis Tools: An Extended Replication StudyNicole Novielli, Fabio Calefato, Filippo Lanubile et al.
Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used 'off-the-shelf'. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.
SEApr 13, 2020
Is 40 the new 60? How popular media portrays the employability of older software developersSebastian Baltes, George Park, Alexander Serebrenik
Alerted by our previous research as well as media reports and discussions in online forums about ageism in the software industry, we set out to study the public discourse around age and software development. With a focus on the USA, we analyzed popular online articles and related discussions on Hacker News through the lens of (perceived) employability issues and potential mitigation strategies. Besides rather controversial strategies such as disguising age-related aspects in résumés or undergoing plastic surgeries to appear young, we highlight the importance of keeping up-to-date, specializing in certain tasks or technologies, and present role transitions as a way forward for veteran developers. With this article, we want to build awareness among decision makers in software projects to help them anticipate and mitigate challenges that their older employees may face.