SEApr 6, 2023Code
Tag that issue: Applying API-domain labels in issue tracking systemsFabio Santos, Joseph Vargovich, Bianca Trinkenreich et al.
Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.
5.3SEMay 7Code
Analyzing the Adoption of Database Management Systems Throughout the History of Open Source ProjectsCamila A. Paiva, Raquel Maximino, Frederico Paiva et al.
Database Management Systems (DBMSs) are widely used to store, retrieve, and manage the data handled by modern applications. Although prior work has studied the co-evolution of DBMSs and application source code, less is known about DBMS adoption, co-use, and replacement in real systems. This paper presents a historical study of DBMS usage in 362 popular open-source Java projects hosted on GitHub. We investigated the adoption of the top DBMSs ranked by DB-Engines, covering relational and non-relational systems. Using source-code heuristics, we analyzed DBMS popularity, stability, migration patterns, co-occurrence, and the role of Object-Relational Mappers (ORMs). Our findings show that MySQL and PostgreSQL are the most popular DBMSs in our corpus. Among non-relational DBMSs, Redis and MongoDB are the most frequently used and tend to remain stable after adoption. In contrast, systems such as HyperSQL are more often replaced as projects evolve. We also observed frequent co-use of multiple DBMSs, suggesting patterns of polyglot persistence in which projects combine systems to handle different data needs. Finally, we found that ORM frameworks are commonly used to mediate interactions between applications and DBMSs. Overall, our study provides empirical evidence on how DBMSs are adopted, combined, and replaced over time, offering guidance for developers, architects, educators, and DBMS vendors.
19.1HCMay 14
Usable but Conventional: An Empirical Study on the UX of AI-Generated Interface PrototypesKaroline Romero, Igor Wiese, Renato Balancieiri et al.
This paper investigates User Experience (UX) with prototypes generated by Generative Artificial Intelligence (GenAI) tools. An empirical survey with 92 participants evaluated AI-generated and human-created prototypes without prior identification of authorship. We measured UX using the UEQ-S, covering pragmatic and hedonic dimensions. Results indicate positive evaluations in pragmatic aspects, such as usability and efficiency, and neutral or negative evaluations in hedonic aspects, including originality and innovation. We concluded that GenAI can produce functional interfaces but tends to reinforce visual and structural patterns that affect perceptions of originality.
SEMay 18, 2021Code
Pots of Gold at the End of the Rainbow: What is Success for Open Source Contributors?Bianca Trinkenreich, Mariam Guizani, Igor Wiese et al.
Success in Open Source Software (OSS) is often perceived as an exclusively code-centric endeavor. This perception can exclude a variety of individuals with a diverse set of skills and backgrounds, in turn helping create the current diversity & inclusion imbalance in OSS. Because people's perspectives of success affect their personal, professional, and life choices, to be able to support a diverse class of individuals, we must first understand what OSS contributors consider successful. Thus far, research has used a uni-dimensional, code-centric lens to define success. In this paper, we challenge this status-quo and reveal the multi-faceted definition of success among OSS contributors. We do so through interviews with 27 OSS contributors who are recognized as successful in their communities, and a follow-up open survey with 193 OSS contributors. Our study provides nuanced definitions of success perceptions in OSS, which might help devise strategies to attract and retain a diverse set of contributors, helping them attain their "pots of gold at the end of the rainbow".
SEMay 18, 2021Code
Women's Participation in Open Source Software: A Survey of the LiteratureBianca Trinkenreich, Igor Wiese, Anita Sarma et al.
Participation of women in Open Source Software (OSS) is very unbalanced, despite various efforts to improve diversity. This is concerning not only because women do not get the chance of career and skill developments afforded by OSS, but also because OSS projects suffer from a lack of diversity of thoughts because of a lack of diversity in their projects. Studies that characterize women's participation and investigate how to attract and retain women are spread across multiple fields, including information systems, software engineering, and social science. This paper systematically maps, aggregates, and synthesizes the state-of-the-art on women's participation in Open Source Software. It focuses on women's representation and the demographics of women who contribute to OSS, how they contribute, the acceptance rates of their contributions, their motivations and challenges, and strategies employed by communities to attract and retain women. We identified 51 articles (published between 2005 and 2021) that investigate women's participation in OSS. According to the literature, women represent about 9.8\% of OSS contributors; most of them are recent contributors, 20-37 years old, devote less than 5h/week to OSS, and make both non-code and code contributions. Only 5\% of projects have women as core developers, and women author less than 5\% of pull-requests but have similar or even higher rates of merge acceptance than men. Besides learning new skills and altruism, reciprocity and kinship are motivations especially relevant for women but can leave if they are not compensated for their contributions. Women's challenges are mainly social, including lack of peer parity and non-inclusive communication from a toxic culture. The literature reports ten strategies, which were mapped to six of the seven challenges. Based on these results, we provide guidelines for future research and practice.
SEMar 25, 2021Code
Don't Disturb Me: Challenges of Interacting with SoftwareBots on Open Source Software ProjectsMairieli Wessel, Igor Wiese, Igor Steinmacher et al.
Software bots are used to streamline tasks in Open Source Software (OSS) projects' pull requests, saving development cost, time, and effort. However, their presence can be disruptive to the community. We identified several challenges caused by bots in pull request interactions by interviewing 21 practitioners, including project maintainers, contributors, and bot developers. In particular, our findings indicate noise as a recurrent and central problem. Noise affects both human communication and development workflow by overwhelming and distracting developers. Our main contribution is a theory of how human developers perceive annoying bot behaviors as noise on social coding platforms. This contribution may help practitioners understand the effects of adopting a bot, and researchers and tool designers may leverage our results to better support human-bot interaction on social coding platforms.
SEMar 25, 2021Code
Quality Gatekeepers: Investigating the Effects ofCode Review Bots on Pull Request ActivitiesMairieli Wessel, Alexander Serebrenik, Igor Wiese et al.
Software bots have been facilitating several development activities in Open Source Software (OSS) projects, including code review. However, these bots may bring unexpected impacts to group dynamics, as frequently occurs with new technology adoption. Understanding and anticipating such effects is important for planning and management. To analyze these effects, we investigate how several activity indicators change after the adoption of a code review bot. We employed a regression discontinuity design on 1,194 software projects from GitHub. We also interviewed 12 practitioners, including open-source maintainers and contributors. Our results indicate that the adoption of code review bots increases the number of monthly merged pull requests, decreases monthly non-merged pull requests, and decreases communication among developers. From the developers' perspective, these effects are explained by the transparency and confidence the bot comments introduce, in addition to the changes in the discussion focused on pull requests. Practitioners and maintainers may leverage our results to understand, or even predict, bot effects on their projects.
SEMar 23, 2021Code
Can I Solve It? Identifying APIs Required to Complete OSS TaskFabio Santos, Igor Wiese, Bianca Trinkenreich et al.
Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. We also ran a user study (n=74) to assess these labels' relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecture-based labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
SEJan 25, 2021Code
The Shifting Sands of Motivation: Revisiting What Drives Contributors in Open SourceMarco Gerosa, Igor Wiese, Bianca Trinkenreich et al.
Open Source Software (OSS) has changed drastically over the last decade, with OSS projects now producing a large ecosystem of popular products, involving industry participation, and providing professional career opportunities. But our field's understanding of what motivates people to contribute to OSS is still fundamentally grounded in studies from the early 2000s. With the changed landscape of OSS, it is very likely that motivations to join OSS have also evolved. Through a survey of 242 OSS contributors, we investigate shifts in motivation from three perspectives: (1) the impact of the new OSS landscape, (2) the impact of individuals' personal growth as they become part of OSS communities, and (3) the impact of differences in individuals' demographics. Our results show that some motivations related to social aspects and reputation increased in frequency and that some intrinsic and internalized motivations, such as learning and intellectual stimulation, are still highly relevant. We also found that contributing to OSS often transforms extrinsic motivations to intrinsic, and that while experienced contributors often shift toward altruism, novices often shift toward career, fun, kinship, and learning. OSS projects can leverage our results to revisit current strategies to attract and retain contributors, and researchers and tool builders can better support the design of new studies and tools to engage and support OSS development.
SEDec 9, 2020Code
From One to Hundreds: Multi-Licensing in the JavaScript EcosystemJoão Pedro Moraes, Ivanilton Polato, Igor Wiese et al.
Open source licenses create a legal framework that plays a crucial role in the widespread adoption of open source projects. Without a license, any source code available on the internet could not be openly (re)distributed. Although recent studies provide evidence that most popular open source projects have a license, developers might lack confidence or expertise when they need to combine software licenses, leading to a mistaken project license unification.This license usage is challenged by the high degree of reuse that occurs in the heart of modern software development practices, in which third-party libraries and frameworks are easily and quickly integrated into a software codebase.This scenario creates what we call "multi-licensed" projects, which happens when one project has components that are licensed under more than one license. Although these components exist at the file-level, they naturally impact licensing decisions at the project-level. In this paper, we conducted a mix-method study to shed some light on these questions. We started by parsing 1,426,263 (source code and non-source code) files available on 1,552 JavaScript projects, looking for license information. Among these projects, we observed that 947 projects (61%) employ more than one license. On average, there are 4.7 licenses per studied project (max: 256). Among the reasons for multi-licensing is to incorporate the source code of third-party libraries into the project's codebase. When doing so, we observed that 373 of the multi-licensed projects introduced at least one license incompatibility issue. We also surveyed with 83 maintainers of these projects aimed to cross-validate our findings. We observed that 63% of the surveyed maintainers are not aware of the multi-licensing implications. For those that are aware, they adopt multiple licenses mostly to conform with third-party libraries' licenses.
SEOct 13, 2019Code
Google Summer of Code: Student Motivations and ContributionsJefferson O. Silva, Igor Wiese, Daniel M. German et al.
Several open source software (OSS) projects expect to foster newcomers' onboarding and to receive contributions by participating in engagement programs, like Summers of Code. However, there is little empirical evidence showing why students join such programs. In this paper, we study the well-established Google Summer of Code (GSoC), which is a 3-month OSS engagement program that offers stipends and mentors to students willing to contribute to OSS projects. We combined a survey (students and mentors) and interviews (students) to understand what motivates students to enter GSoC. Our results show that students enter GSoC for an enriching experience, not necessarily to become frequent contributors. Our data suggest that, while the stipends are an important motivator, the students participate for work experience and the ability to attach the name of the supporting organization to their resumés. We also discuss practical implications for students, mentors, OSS projects, and Summer of Code programs.
SEOct 8, 2021
A Mining Software Repository Extended Cookbook: Lessons learned from a literature reviewDaniel Barros, Flavio Horita, Igor Wiese et al.
The main purpose of Mining Software Repositories (MSR) is to discover the latest enhancements and provide an insight into how to make improvements in a software project. In light of it, this paper updates the MSR findings of the original MSR Cookbook, by first conducting a systematic mapping study to elicit and analyze the state-of-the-art, and then proposing an extended version of the Cookbook. This extended Cookbook was built on four high-level themes, which were derived from the analysis of a list of 112 selected studies. Hence, it was used to consolidate the extended Cookbook as a contribution to practice and research in the following areas by: 1) including studies published in all available and relevant publication venues; 2) including and updating recommendations in all four high-level themes, with an increase of 84% in comments in this study when compared with the original MSR Cookbook; 3) summarizing the tools employed for each high-level theme; and 4) providing lessons learned for future studies. Thus, the extended Cookbook examined in this work can support new research projects, as upgraded recommendations and the lessons learned are available with the aid of samples and tools.
SEJul 13, 2021
What Evidence We Would Miss If We Do Not Use Grey Literature?Fernando Kamei, Gustavo Pinto, Igor Wiese et al.
Context: Over the last years, Grey Literature (GL) is gaining increasing attention in Secondary Studies in Software Engineering (SE). Notably, Multivocal Literature Review (MLR) studies, that search for evidence in both Traditional Literature (TL) and GL, is particularly benefiting from this raise of GL content. Despite the growing interest in MLR-based studies, the literature assessing how GL has contributed to MLR studies is still scarce. Objective: This research aims to assess how the use of GL contributed to MLR studies. By contributing, we mean, understanding to what extent GL is providing evidence that is indeed used by an MLR to answer its research question. Method: We conducted a tertiary study to identify MLR studies published between 2017 and 2019, selecting nine MLRs studies. Using qualitative and quantitative analysis, we identified the GL used and assessed to what extent these MLRs are contributing to MLR studies. Results: Our analysis identified that 1) GL provided evidence not found in TL, 2) most of the GL sources were used to provide recommendations to solve problems, explain a topic, and classify the findings, and 3) 19 different GL types were used in the studies; these GLs were mainly produced by SE practitioners (including blog posts, slides presentations, or project descriptions). Conclusions: We evidence how GL contributed to MLR studies. We observed that if these GLs were not included in the MLR, several findings would have been omitted or weakened. We also described the challenges involved when conducting this investigation, along with potential ways to deal with them, which may help future SE researchers.
SEApr 27, 2021
Grey Literature in Software Engineering: A Critical ReviewFernando Kamei, Igor Wiese, Crescencio Lima et al.
Context: Grey Literature (GL) recently has grown in Software Engineering (SE) research since the increased use of online communication channels by software engineers. However, there is still a limited understanding of how SE research is taking advantage of GL. Objective: This research aimed to understand how SE researchers use GL in their secondary studies. Method: We conducted a tertiary study of studies published between 2011 and 2018 in high-quality software engineering conferences and journals. We then applied qualitative and quantitative analysis to investigate 446 potential studies. Results: From the 446 selected studies, 126 studies cited GL but only 95 of those used GL to answer a specific research question representing almost 21% of all the 446 secondary studies. Interestingly, we identified that few studies employed specific search mechanisms and used additional criteria for assessing GL. Moreover, by the time we conducted this research, 49% of the GL URLs are not working anymore. Based on our findings, we discuss some challenges in using GL and potential mitigation plans. Conclusion: In this paper, we summarized the last 10 years of software engineering research that uses GL, showing that GL has been essential for bringing practical new perspectives that are scarce in traditional literature. By drawing the current landscape of use, we also raise some awareness of related challenges (and strategies to deal with them).
SESep 13, 2020
On the Use of Grey Literature: A Survey with the Brazilian Software Engineering Research CommunityFernando Kamei, Igor Wiese, Gustavo Pinto et al.
Background: The use of Grey Literature (GL) has been investigate in diverse research areas. In Software Engineering (SE), this topic has an increasing interest over the last years. Problem: Even with the increase of GL published in diverse sources, the understanding of their use on the SE research community is still controversial. Objective: To understand how Brazilian SE researchers use GL, we aimed to become aware of the criteria to assess the credibility of their use, as well as the benefits and challenges. Method: We surveyed 76 active SE researchers participants of a flagship SE conference in Brazil, using a questionnaire with 11 questions to share their views on the use of GL in the context of SE research. We followed a qualitative approach to analyze open questions. Results: We found that most surveyed researchers use GL mainly to understand new topics. Our work identified new findings, including: 1) GL sources used by SE researchers (e.g., blogs, community website); 2) motivations to use (e.g., to understand problems and to complement research findings) or reasons to avoid GL (e.g., lack of reliability, lack of scientific value); 3) the benefit that is easy to access and read GL and the challenge of GL to have its scientific value recognized; and 4) criteria to assess GL credibility, showing the importance of the content owner to be renowned (e.g., renowned author and institutions). Conclusions: Our findings contribute to form a body of knowledge on the use of GL by SE researchers, by discussing novel (some contradictory) results and providing a set of lessons learned to both SE researchers and practitioners.
SEFeb 3, 2020
Analyzing the evolution and diversity of SBES Program CommitteeFabio Pacheco, Igor Wiese, Bruno Cartaxo et al.
The Brazilian Symposium on Software Engineering (SBES) is one of the most important Latin American Software Engineering conferences. It was first held in 1987, and in 2019 marks its 33rd edition. Over these years, many researchers have participated in SBES, attending the conference, submitting, and reviewing papers. The researchers who participate in the Program Committee (PC) and perform the reviewers' role are fundamentally important to SBES, since their evaluations (e.g., deciding whether a paper is accepted or not) have the potential of drawing what SBES is now. Knowing that diversity is an important aspect of any group work, we wanted to understand diversity in the SBES PC community. We investigated a number of characteristics of SBES PC members, including their gender and geographic location. We also analyzed the turnover and renovation of the committee. Among the findings, we observed that although the number of participants in the SBES PC has increased over the years, most of them are men (~80%) and from the Southeast and Northeast of Brazil, with very few members from the North region. We also observed that there is a small turnover: during the 2010 decade, only 11% of new members were added to the PC. Finally, we investigated the participation of the PC members publishing papers at SBES. We observed that only 24% of the papers accepted to SBES were authored by members who were not committee members of the respective year. Moreover, committee members usually do not collaborate among themselves: a significant number of the papers are authored by the PC members and students. This paper may contribute to the SBES community, in particular, its special interest group, in understanding the needs and challenges of the PC's participants.