Gustavo Pinto

h-index32

21papers

411citations

Novelty22%

AI Score28

Ranked #146,890 of 194,257 authors (top 76%)#1,690 in SE (top 56%)

21 Papers

4.3SEJul 30, 2022Code

Automatically Categorising GitHub Repositories by Application Domain

Francisco Zanartu, Christoph Treude, Bruno Cartaxo et al.

GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository and reasoning about project quality. In this work, we build on a previously annotated dataset of 5,000 GitHub repositories to design an automated classifier for categorising repositories by their application domain. The classifier uses state-of-the-art natural language processing techniques and machine learning to learn from multiple data sources and catalogue repositories according to five application domains. We contribute with (1) an automated classifier that can assign popular repositories to each application domain with at least 70% precision, (2) an investigation of the approach's performance on less popular repositories, and (3) a practical application of this approach to answer how the adoption of software engineering practices differs across application domains. Our work aims to help the GitHub community identify repositories of interest and opens promising avenues for future work investigating differences between repositories from different application domains.

7.3SEDec 9, 2020

From One to Hundreds: Multi-Licensing in the JavaScript Ecosystem

João Pedro Moraes, Ivanilton Polato, Igor Wiese et al.

Open source licenses create a legal framework that plays a crucial role in the widespread adoption of open source projects. Without a license, any source code available on the internet could not be openly (re)distributed. Although recent studies provide evidence that most popular open source projects have a license, developers might lack confidence or expertise when they need to combine software licenses, leading to a mistaken project license unification.This license usage is challenged by the high degree of reuse that occurs in the heart of modern software development practices, in which third-party libraries and frameworks are easily and quickly integrated into a software codebase.This scenario creates what we call "multi-licensed" projects, which happens when one project has components that are licensed under more than one license. Although these components exist at the file-level, they naturally impact licensing decisions at the project-level. In this paper, we conducted a mix-method study to shed some light on these questions. We started by parsing 1,426,263 (source code and non-source code) files available on 1,552 JavaScript projects, looking for license information. Among these projects, we observed that 947 projects (61%) employ more than one license. On average, there are 4.7 licenses per studied project (max: 256). Among the reasons for multi-licensing is to incorporate the source code of third-party libraries into the project's codebase. When doing so, we observed that 373 of the multi-licensed projects introduced at least one license incompatibility issue. We also surveyed with 83 maintainers of these projects aimed to cross-validate our findings. We observed that 63% of the surveyed maintainers are not aware of the multi-licensing implications. For those that are aware, they adopt multiple licenses mostly to conform with third-party libraries' licenses.

10.4SEDec 7, 2020

How Successful Are Open Source Contributions From Countries with Different Levels of Human Development?

Leonardo Furtado, Bruno Cartaxo, Christoph Treude et al.

Are Brazilian developers less likely to have a contribution accepted than their peers from, say, the United Kingdom? In this paper we studied whether the developers' location relates to the outcome of a pull request. We curated the locations of 14k contributors who performed 44k pull requests to 20 open source projects. Our results indeed suggest that developers from countries with low human development indexes (HDI) not only perform a small fraction of the overall pull requests, but they also are the ones that face rejection the most.

16.5SEJul 27, 2020Code

Work Practices and Perceptions from Women Core Developers in OSS Communities

Edna Dias Canedo, Rodrigo Bonifácio, Márcio Vinícius Okimoto et al.

The effect of gender diversity in open source communities has gained increasing attention from practitioners and researchers. For instance, organizations such as the Python Software Foundation and the OpenStack Foundation started actions to increase gender diversity and promote women to top positions in the communities. Although the general underrepresentation of women (a.k.a. horizontal segregation) in open source communities has been explored in a number of research studies, little is known about the vertical segregation in open source communities -- which occurs when there are fewer women in high-level positions. To address this research gap, in this paper we present the results of a mixed-methods study on gender diversity and work practices of core developers contributing to open-source communities. In the first study, we used mining-software repositories procedures to identify the core developers of 711 open source projects, in order to understand how common are women core developers in open source communities and characterize their work practices. In the second study, we surveyed the women core developers we identified in the first study to collect their perceptions of gender diversity and gender bias they might have observed while contributing to open source systems. Our findings show that open source communities present both horizontal and vertical segregation (only 2.3% of the core developers are women). Nevertheless, differently from previous studies, most of the women core developers (65.7%) report never having experienced gender discrimination when contributing to an open source project. Finally, we did not note substantial differences between the work practices among women and men core developers. We reflect on these findings and present some ideas that might increase the participation of women in open source communities.

10.4SEMar 23, 2020

Characterizing the Roles of Contributors in Open-source Scientific Software Projects

Reed Milewicz, Gustavo Pinto, Paige Rodeghero

The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g.graduate students) also contribute substantially -- in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors' behavior, while paving the road for future software engineering research

9.9SEJul 2, 2019Code

Continuous Integration Theater

Wagner Felidré, Leonardo Furtado, Daniel da Costa et al.

Background: Continuous Integration (CI) systems are now the bedrock of several software development practices. Several tools such as TravisCI, CircleCI, and Hudson, that implement CI practices, are commonly adopted by software engineers. However, the way that software engineers use these tools could lead to what we call "Continuous Integration Theater", a situation in which software engineers do not employ these tools effectively, leading to unhealthy CI practices. Aims: The goal of this paper is to make sense of how commonplace are these unhealthy continuous integration practices being employed in practice. Method: By inspecting 1,270 open-source projects that use TravisCI, the most used CI service, we quantitatively studied how common is to use CI (1) with infrequent commits, (2) in a software project with poor test coverage, (3) with builds that stay broken for long periods, and (4) with builds that take too long to run. Results: We observed that 748 ($sim$60%) projects face infrequent commits, which essentially makes the merging process harder. Moreover, we were able to find code coverage information for 51 projects. The average code coverage was 78%, although Ruby projects have a higher code coverage than Java projects (86% and 63%, respectively). However, some projects with very small coverage ($sim$4%) were found. Still, we observed that 85% of the studied projects have at least one broken build that take more than four days to be fixed. Interestingly, very small projects (up to 1,000 lines of code) are the ones that take the longest to fix broken builds. Finally, we noted that, for the majority of the studied projects, the build is executed under the 10 minutes rule of thumb. Conclusions: Our results are important to an increasing community of software engineers that employ CI practices on daily basis but may not be aware of bad practices that are eventually employed.

4.9SEMay 3, 2018

Open Source Development Around the World: A Comparative Study

Thais Mombach, Marco Tulio Valente, Cuiting Chen et al.

Open source software has an increasing importance in our modern society, providing basic services to other software systems and also supporting the rapid development of a variety of end-user applications. Recently, world-wide code sharing platforms, like GitHub, are also contributing to open source's growth. However, little is known on how this growth is distributed around the world and about the characteristics of the projects developed in different countries. In this article, we provide a characterization of 2,648 open source projects developed in 20 countries. We reveal the number of projects per country, the popularity and programming language of each country's project and also show how the number of projects in a country correlates to its GDP. Finally, we assess the maintainability and internal code quality of the studied projects, using a tool called BetterCodeHub.

8.0SEMay 29, 2025

Toward Effective AI Governance: A Review of Principles

Danilo Ribeiro, Thayssa Rocha, Gustavo Pinto et al.

Artificial Intelligence (AI) governance is the practice of establishing frameworks, policies, and procedures to ensure the responsible, ethical, and safe development and deployment of AI systems. Although AI governance is a core pillar of Responsible AI, current literature still lacks synthesis across such governance frameworks and practices. Objective: To identify which frameworks, principles, mechanisms, and stakeholder roles are emphasized in secondary literature on AI governance. Method: We conducted a rapid tertiary review of nine peer-reviewed secondary studies from IEEE and ACM (20202024), using structured inclusion criteria and thematic semantic synthesis. Results: The most cited frameworks include the EU AI Act and NIST RMF; transparency and accountability are the most common principles. Few reviews detail actionable governance mechanisms or stakeholder strategies. Conclusion: The review consolidates key directions in AI governance and highlights gaps in empirical validation and inclusivity. Findings inform both academic inquiry and practical adoption in organizations.

3.6SEOct 23, 2021

Changing Software Engineers' Self-Efficacy with Bootcamps:A Research Proposal

Danilo Monteiro Ribeiro, Alberto Souza, Victor Santiago et al.

In several areas of knowledge, self-efficacy is related to the perfomance of individuals, including in Software Engineering. However,it is not clear how self-efficacy can be modified in training conducted by the industry. Furthermore, we still do not understand how self-efficacy can impact an individual's team and career in the industry. This lack of understanding can negatively impact how companies and individuals perceive the importance of self-efficacy in the field. Therefore, We present a research proposal that aims to understand the relationship between self-efficacy and training in Software Engineering. Moreover, we look to understand the role of self-efficacy at Software Development industry. We propose a longitudinal case study with software engineers at Zup Innovation that participating of our bootcamp training. We expect to collect data to support our assumptions that self-efficacy can be related to training in Software Engineering. The other assumption is that self-efficacy at the beginning of training is higher than the middle, and that self-efficacy at the end of training is higher than the middle. We expect that the study proposed in this article will motivate a discussion about self-efficacy and the importance of training employers in the industry of software development.

6.4SEJul 13, 2021

What Evidence We Would Miss If We Do Not Use Grey Literature?

Fernando Kamei, Gustavo Pinto, Igor Wiese et al.

Context: Over the last years, Grey Literature (GL) is gaining increasing attention in Secondary Studies in Software Engineering (SE). Notably, Multivocal Literature Review (MLR) studies, that search for evidence in both Traditional Literature (TL) and GL, is particularly benefiting from this raise of GL content. Despite the growing interest in MLR-based studies, the literature assessing how GL has contributed to MLR studies is still scarce. Objective: This research aims to assess how the use of GL contributed to MLR studies. By contributing, we mean, understanding to what extent GL is providing evidence that is indeed used by an MLR to answer its research question. Method: We conducted a tertiary study to identify MLR studies published between 2017 and 2019, selecting nine MLRs studies. Using qualitative and quantitative analysis, we identified the GL used and assessed to what extent these MLRs are contributing to MLR studies. Results: Our analysis identified that 1) GL provided evidence not found in TL, 2) most of the GL sources were used to provide recommendations to solve problems, explain a topic, and classify the findings, and 3) 19 different GL types were used in the studies; these GLs were mainly produced by SE practitioners (including blog posts, slides presentations, or project descriptions). Conclusions: We evidence how GL contributed to MLR studies. We observed that if these GLs were not included in the MLR, several findings would have been omitted or weakened. We also described the challenges involved when conducting this investigation, along with potential ways to deal with them, which may help future SE researchers.

8.6SEApr 27, 2021

Grey Literature in Software Engineering: A Critical Review

Fernando Kamei, Igor Wiese, Crescencio Lima et al.

Context: Grey Literature (GL) recently has grown in Software Engineering (SE) research since the increased use of online communication channels by software engineers. However, there is still a limited understanding of how SE research is taking advantage of GL. Objective: This research aimed to understand how SE researchers use GL in their secondary studies. Method: We conducted a tertiary study of studies published between 2011 and 2018 in high-quality software engineering conferences and journals. We then applied qualitative and quantitative analysis to investigate 446 potential studies. Results: From the 446 selected studies, 126 studies cited GL but only 95 of those used GL to answer a specific research question representing almost 21% of all the 446 secondary studies. Interestingly, we identified that few studies employed specific search mechanisms and used additional criteria for assessing GL. Moreover, by the time we conducted this research, 49% of the GL URLs are not working anymore. Based on our findings, we discuss some challenges in using GL and potential mitigation plans. Conclusion: In this paper, we summarized the last 10 years of software engineering research that uses GL, showing that GL has been essential for bringing practical new perspectives that are scarce in traditional literature. By drawing the current landscape of use, we also raise some awareness of related challenges (and strategies to deal with them).

3.0ROMar 25, 2021Code

Mining Energy-Related Practices in Robotics Software

Michel Albonico, Ivano Malavolta, Gustavo Pinto et al.

Robots are becoming more and more commonplace in many industry settings. This successful adoption can be partly attributed to (1) their increasingly affordable cost and (2) the possibility of developing intelligent, software-driven robots. Unfortunately, robotics software consumes significant amounts of energy. Moreover, robots are often battery-driven, meaning that even a small energy improvement can help reduce its energy footprint and increase its autonomy and user experience. In this paper, we study the Robot Operating System (ROS) ecosystem, the de-facto standard for developing and prototyping robotics software. We analyze 527 energy-related data points (including commits, pull-requests, and issues on ROS-related repositories, ROS-related questions on StackOverflow, ROS Discourse, ROS Answers, and the official ROS Wiki). Our results include a quantification of the interest of roboticists on software energy efficiency, 10 recurrent causes, and 14 solutions of energy-related issues, and their implied trade-offs with respect to other quality attributes. Those contributions support roboticists and researchers towards having energy-efficient software in future robotics projects.

7.3SEDec 13, 2020

How Trans-Inclusive are Hackathons?

Rafa Prado, Wendy Galeno, Kiev Gama et al.

Hackathons are fun! People go there to learn, meet new colleagues, intensively work on a collaborative project, and mix pizza with energy drinks. However, for transgender community and other minorities, hackathons can have an uncomfortable atmosphere. Some transgender and non-conforming people that, although enjoying hackathons, decided not to participate anymore, afraid of LGBQTPhobia and other discomforts. In this paper we surveyed 44 trans and cis hackathons participants and interviewed seven transgender ones. By understanding their needs and challenges, we introduce five recommendations to make hackathons more inclusive.

10.4SEDec 7, 2020Code

Exposing Bugs in JavaScript Engines through Test Transplantation and Differential Testing

Igor Lima, Jefferson Silva, Breno Miranda et al.

Context. JavaScript is a popular programming language today with several implementations competing for market dominance. Although a specification document and a conformance test suite exist to guide engine development, bugs occur and have important practical consequences. Implementing correct engines is challenging because the spec is intentionally incomplete and evolves frequently. Objective. This paper investigates the use of test transplantation and differential testing for revealing functional bugs in JavaScript engines. The former technique runs the regression test suite of a given engine on another engine. The latter technique fuzzes existing inputs and then compares the output produced by different engines with a differential oracle. Method. We conducted experiments with engines from five major players-Apple, Facebook, Google, Microsoft, and Mozilla-to assess the effectiveness of test transplantation and differential testing. Results. Our results indicate that both techniques revealed several bugs, many of which confirmed by developers. We reported 35 bugs with test transplantation (23 of these bugs confirmed and 19 fixed) and reported 24 bugs with differential testing (17 of these confirmed and 10 fixed). Results indicate that most of these bugs affected two engines-Apple's JSC and Microsoft's ChakraCore (24 and 26 bugs, respectively). To summarize, our results show that test transplantation and differential testing are easy to apply and very effective in finding bugs in complex software, such as JavaScript engines.

5.3SEDec 7, 2020

Small Changes, Big Impacts: Leveraging Diversity to Improve Energy Efficiency

Wellington Oliveira, Hugo Matalonga, Gustavo Pinto et al.

In the last few years, a growing body of research has proposed methods, techniques, and tools to support developers in the construction of software that consumes less energy. These solutions leverage diverse approaches such as version history mining, analytical models, identifying energy-efficient color schemes, and optimizing the packaging of HTTP requests. In this chapter, we present a complementary approach. We advocate that developers should leverage software diversity to make software systems more energy-efficient. Our main insight is that non-specialists can build software that consumes less energy by alternating at development time between readily available, diversely-designed pieces of software implemented by third-parties. These pieces of software can vary in nature, granularity, and quality attributes. Examples include data structures and constructs for thread management and synchronization.

8.9SESep 13, 2020

On the Use of Grey Literature: A Survey with the Brazilian Software Engineering Research Community

Fernando Kamei, Igor Wiese, Gustavo Pinto et al.

Background: The use of Grey Literature (GL) has been investigate in diverse research areas. In Software Engineering (SE), this topic has an increasing interest over the last years. Problem: Even with the increase of GL published in diverse sources, the understanding of their use on the SE research community is still controversial. Objective: To understand how Brazilian SE researchers use GL, we aimed to become aware of the criteria to assess the credibility of their use, as well as the benefits and challenges. Method: We surveyed 76 active SE researchers participants of a flagship SE conference in Brazil, using a questionnaire with 11 questions to share their views on the use of GL in the context of SE research. We followed a qualitative approach to analyze open questions. Results: We found that most surveyed researchers use GL mainly to understand new topics. Our work identified new findings, including: 1) GL sources used by SE researchers (e.g., blogs, community website); 2) motivations to use (e.g., to understand problems and to complement research findings) or reasons to avoid GL (e.g., lack of reliability, lack of scientific value); 3) the benefit that is easy to access and read GL and the challenge of GL to have its scientific value recognized; and 4) criteria to assess GL credibility, showing the importance of the content owner to be renowned (e.g., renowned author and institutions). Conclusions: Our findings contribute to form a body of knowledge on the use of GL by SE researchers, by discussing novel (some contradictory) results and providing a set of lessons learned to both SE researchers and practitioners.

7.3SEAug 19, 2020

The Organization of Software Teams in the Quest for Continuous Delivery: A Grounded Theory Approach

Leonardo Leite, Gustavo Pinto, Fabio Kon et al.

Context: To accelerate time-to-market and improve customer satisfaction, software-producing organizations have adopted continuous delivery practices, impacting the relations between development and infrastructure professionals. Yet, no substantial literature has substantially tackled how the software industry structures the organization of development and infrastructure teams. Objective: In this study, we investigate how software-producing organizations structure their development and infrastructure teams, specifically how is the division of labor among these groups and how they interact. Method: After brainstorming with 7 DevOps experts to better formulate our research and procedures, we collected and analyzed data from 37 semi-structured interviews with IT professionals, following Grounded Theory guidelines. Results: After a careful analysis, we identified four common organizational structures: (1) siloed departments, (2) classical DevOps, (3) cross-functional teams, and (4) platform teams. We also observed that some companies are transitioning between these structures. Conclusion: The main contribution of this study is a theory in the form of a taxonomy that organizes the found structures along with their properties. This theory could guide researchers and practitioners to think about how to better structure development and infrastructure professionals in software-producing organizations.

14.8SEMar 22, 2020

Rapid Reviews in Software Engineering

Bruno Cartaxo, Gustavo Pinto, Sergio Soares

Integrating research evidence into practice is one of the main goals of Evidence-Based Software Engineering (EBSE). Secondary studies, one of the main EBSE products, are intended to summarize the best research evidence and make them easily consumable by practitioners. However, recent studies show that some secondary studies lack connections with software engineering practice. In this chapter, we present the concept of Rapid Reviews, which are lightweight secondary studies focused on delivering evidence to practitioners in a timely manner. Rapid reviews support practitioners in their decision-making, and should be conducted bounded to a practical problem, inserted into a practical context. Thus, Rapid Reviews can be easily integrated in a knowledge/technology transfer initiative. After describing the basic concepts, we present the results and experiences of conducting two Rapid Reviews. We also provide guidelines to help researchers and practitioners who want to conduct Rapid Reviews, and we finally discuss topics that my concern the research community about the feasibility of Rapid Reviews as an Evidence-Based method. In conclusion, we believe Rapid Reviews might interest researchers and practitioners working in the intersection between software engineering research and practice.

5.3SEFeb 3, 2020

Analyzing the evolution and diversity of SBES Program Committee

Fabio Pacheco, Igor Wiese, Bruno Cartaxo et al.

The Brazilian Symposium on Software Engineering (SBES) is one of the most important Latin American Software Engineering conferences. It was first held in 1987, and in 2019 marks its 33rd edition. Over these years, many researchers have participated in SBES, attending the conference, submitting, and reviewing papers. The researchers who participate in the Program Committee (PC) and perform the reviewers' role are fundamentally important to SBES, since their evaluations (e.g., deciding whether a paper is accepted or not) have the potential of drawing what SBES is now. Knowing that diversity is an important aspect of any group work, we wanted to understand diversity in the SBES PC community. We investigated a number of characteristics of SBES PC members, including their gender and geographic location. We also analyzed the turnover and renovation of the committee. Among the findings, we observed that although the number of participants in the SBES PC has increased over the years, most of them are men (~80%) and from the Southeast and Northeast of Brazil, with very few members from the North region. We also observed that there is a small turnover: during the 2010 decade, only 11% of new members were added to the PC. Finally, we investigated the participation of the PC members publishing papers at SBES. We observed that only 24% of the papers accepted to SBES were authored by members who were not committee members of the respective year. Moreover, committee members usually do not collaborate among themselves: a significant number of the papers are authored by the PC members and students. This paper may contribute to the SBES community, in particular, its special interest group, in understanding the needs and challenges of the PC's participants.

5.0SEJun 26, 2019Code

Software Engineering Research Community Viewpoints on Rapid Reviews

Bruno Cartaxo, Gustavo Pinto, Baldoino Fonseca et al.

Background: One of the most important current challenges of Software Engineering (SE) research is to provide relevant evidence to practice. In health related fields, Rapid Reviews (RRs) have shown to be an effective method to achieve that goal. However, little is known about how the SE research community perceives the potential applicability of RRs. Aims: The goal of this study is to understand the SE research community viewpoints towards the use of RRs as a means to provide evidence to practitioners. Method: To understand their viewpoints, we invited 37 researchers to analyze 50 opinion statements about RRs, and rate them according to what extent they agree with each statement. Q-Methodology was employed to identify the most salient viewpoints, represented by the so called factors. Results: Four factors were identified: Factor A groups undecided researchers that need more evidence before using RRs; Researchers grouped in Factor B are generally positive about RRs, but highlight the need to define minimum standards; Factor C researchers are more skeptical and reinforce the importance of high quality evidence; Researchers aligned to Factor D have a pragmatic point of view, considering RRs can be applied based on the context and constraints faced by practitioners. Conclusions: In conclusion, although there are opposing viewpoints, there are also some common grounds. For example, all viewpoints agree that both RRs and Systematic Reviews can be poorly or well conducted.

11.9SESep 14, 2018

Building a Collaborative Culture: A Grounded Theory of Well Succeeded DevOps Adoption in Practice

Welder Pinheiro Luz, Gustavo Pinto, Rodrigo Bonifácio

Background. DevOps is a set of practices and cultural values that aims to reduce the barriers between development and operations teams. Due to its increasing interest and imprecise definitions, existing research works have tried to characterize DevOps---mainly using a set of concepts and related practices. Aims. Nevertheless, little is known about thepractitioners practitioners' understanding about successful paths for DevOps adoption. The lack of such understanding might hinder institutions to adopt DevOps practices. Therefore, our goal here is to present a theory about DevOps adoption, highlighting the main related concepts that contribute to its adoption in industry. Method. Our work builds upon Classic Grounded Theory. We interviewed practitioners that contributed to DevOps adoption in 15 companies from different domains and across 5 countries. We empirically evaluate our model through a case study, whose goal is to increase the maturity level of DevOps adoption at the Brazilian Federal Court of Accounts, a Brazilian Government institution.Results. This paper presents a model to improve both the understanding and guidance of DevOps adoption. The model increments the existing view of DevOps by explaining the role and motivation of each category (and their relationships) in the DevOps adoption process. We organize this model in terms of DevOps enabler categories and DevOps outcome categories. We provide evidence that collaboration is the core DevOps concern, contrasting with an existing wisdom that implanting specific tools to automate building, deployment, and infrastructure provisioning and management is enough to achieve DevOps. Conclusions. Altogether, our results contribute to (a) generating an adequate understanding of DevOps, from the perspective of practitioners; and (b) assisting other institutions in the migration path towards DevOps adoption.