32.6SEJun 3
How Software Engineering Students Use LLMs to Write Research Papers: An Experience ReportRonnie de Souza Santos, Maria Teresa Baldassarre, Cleyton Magalhaes et al.
Large language models are increasingly becoming part of software engineering education, including activities involving empirical software engineering and evidence synthesis. This paper reports an educational experience involving the integration of reflective LLM use into an empirical methods assignment in a third-year software architecture course. Students were asked to develop a short research paper using either a rapid review or a gray literature review methodology and to disclose how LLMs were used throughout the assignment. We analyzed 146 student disclosure statements using a cross-analysis process combining LLM-assisted categorization with manual verification and refinement by the researchers. The reflections describe how students incorporated LLMs during activities such as brainstorming, methodological clarification, organization of findings, and writing refinement, while also reporting concerns regarding inaccuracies and verification of generated content. This experience report discusses lessons learned and educational implications for integrating AI-assisted technologies into empirical software engineering education.
6.6SEApr 12
A Quasi-Experimental Evaluation of Coaching to Mitigate the Impostor Phenomenon in Early-Career Software EngineersPaloma Guenes, Joan Leite, Rafael Tomaz et al.
Context: The Impostor Phenomenon (IP), the persistent belief of being a fraud despite evident competence, is common in Software Engineering (SE), where high expectations for expertise and innovation prevail. Although coaching and similar interventions are proposed to mitigate IP, empirical evidence in SE remains underexplored. Objective: This study examines the impact of a structured group coaching intervention on reducing IP feelings among early-career software engineers. Method: We conducted a quasi-experiment with 20 participants distributed across two project teams using a wait-list control design, complemented by non-participant observation. The treatment group received a three-session coaching intervention, while the control group received it after an observation phase. IP was assessed using the Clance Impostor Phenomenon Scale (CIPS), alongside evaluated measures of well-being (WHO-5), life satisfaction (SWLS), and affect (PANAS). Results: The coaching resulted in modest reductions in CIPS scores, whereas the control group also improved during the observation phase, suggesting that contextual and temporal factors may have exerted a stronger influence than the formal intervention. Conclusion: These results suggest that coaching may support reflection and awareness related to IP, yet other contextual aspects of team collaboration and project work might also contribute to these changes. This study offers a novel empirical step toward understanding how structured IP interventions operate within SE environments.
CYMay 15, 2024
Trustworthy AI in practice: an analysis of practitioners' needs and challengesMaria Teresa Baldassarre, Domenico Gigante, Marcos Kalinowski et al.
Recently, there has been growing attention on behalf of both academic and practice communities towards the ability of Artificial Intelligence (AI) systems to operate responsibly and ethically. As a result, a plethora of frameworks and guidelines have appeared to support practitioners in implementing Trustworthy AI applications (TAI). However, little research has been done to investigate whether such frameworks are being used and how. In this work, we study the vision AI practitioners have on TAI principles, how they address them, and what they would like to have - in terms of tools, knowledge, or guidelines - when they attempt to incorporate such principles into the systems they develop. Through a survey and semi-structured interviews, we systematically investigated practitioners' challenges and needs in developing TAI systems. Based on these practical findings, we highlight recommendations to help AI practitioners develop Trustworthy AI applications.
SEMar 8
The role of team diversity in AI systems developmentRonnie de Souza Santos, Maria Teresa Baldassarre, Cleyton Magalhaes
The widespread integration of AI technologies has intensified concerns about fairness and bias, as these systems often perpetuate societal inequalities through flawed data and design choices. While software engineering research has largely concentrated on technical solutions, such as improving datasets and models, the social dynamics that shape AI outcomes remain underexplored. This study investigates the role of team diversity in the development of AI systems. Drawing from the experience of four AI focused teams working in a large software company operating in Brazil and Portugal, and collaborating with global clients, the study explores how diverse teams influence the development of AI systems. Using Grounded Theory, we conducted 25 interviews with software professionals involved in projects spanning domains such as education, energy, accessibility, and facial recognition. Although our study is conducted in an organizational setting, the variety of projects, from regional to multinational, ensures exposure to global development practices and diverse team dynamics, bringing a variety of perspectives into our findings. Our analysis revealed six key roles that team diversity played in AI development: diversifying perspectives for bias identification, bringing empathy to AI development, addressing systemic discrimination, supporting inclusive and participatory decision making, using diversity as a safeguard against bias, and fostering broadened thinking in problem solving. These findings highlight the importance of incorporating diverse perspectives in AI projects and offer practical recommendations for integrating fairness considerations into software development practices.
SEAug 15, 2021
Crowdsourcing the State of the Art(ifacts)Maria Teresa Baldassarre, Neil Ernst, Ben Hermann et al.
In any field, finding the "leading edge" of research is an on-going challenge. Researchers cannot appease reviewers and educators cannot teach to the leading edge of their field if no one agrees on what is the state-of-the-art. Using a novel crowdsourced "reuse graph" approach, we propose here a new method to learn this state-of-the-art. Our reuse graphs are less effort to build and verify than other community monitoring methods (e.g. artifact tracks or citation-based searches). Based on a study of 170 papers from software engineering (SE) conferences in 2020, we have found over 1,600 instances of reuse; i.e., reuse is rampant in SE research. Prior pessimism about a lack of reuse in SE research may have been a result of using the wrong methods to measure the wrong things.
SEMay 7, 2021
Studying Test-Driven Development and its Retainment Over a Six-month Time SpanMaria Teresa Baldassarre, Danilo Caivano, Davide Fucci et al.
In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers' productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.
SEAug 28, 2020
Researcher Bias in Software Engineering Experiments: a Qualitative InvestigationSimone Romano, Davide Fucci, Giuseppe Scanniello et al.
Researcher Bias (RB) occurs when researchers influence the results of an empirical study based on their expectations.RB might be due to the use of Questionable Research Practices(QRPs). In research fields like medicine, blinding techniques have been applied to counteract RB. We conducted an explorative qualitative survey to investigate RB in Software Engineering (SE)experiments, with respect to (i) QRPs potentially leading to RB, (ii) causes behind RB, and (iii) possible actions to counteract including blinding techniques. Data collection was based on semi-structured interviews. We interviewed nine active experts in the empirical SE community. We then analyzed the transcripts of these interviews through thematic analysis. We found that some QRPs are acceptable in certain cases. Also, it appears that the presence of RB is perceived in SE and, to counteract RB, a number of solutions have been highlighted: some are intended for SE researchers and others for the boards of SE research outlets.
CYJul 10, 2020
Secondary Studies in the Academic Context: A Systematic Mapping and SurveyKatia Romero Felizardo, Érica Ferreira de Souza, Bianca Minetto Napoleão et al.
Context: Several researchers have reported their experiences in applying secondary studies (Systematic Literature Reviews - SLRs and Systematic Mappings - SMs) in Software Engineering (SE). However, there is still a lack of studies discussing the value of performing secondary studies in an academic context. Goal: The main goal of this study is to provide an overview on the use of secondary studies in an academic context. Method: Two empirical research methods were used. Initially, we conducted an SM to identify the available and relevant studies on the use of secondary studies as a research methodology for conducting SE research projects. Secondly, a survey was performed with 64 SE researchers to identify their perception related to the value of performing secondary studies to support their research projects. Results: Our results show benefits of using secondary studies in the academic context, such as, providing an overview of the literature as well as identifying relevant research literature on a research area enabling to find reasons to explain why a research project should be approved for a grant and/or supporting decisions made in a research project. Difficulties faced by SE graduate students with secondary studies are that they tend to be conducted by a team and it demands more effort than a traditional review. Conclusions: Secondary studies are valuable to graduate students. They should consider conducting a secondary study for their research project due to the benefits and contributions provided to develop the overall project. However, the advice of an experienced supervisor is essential to avoid bias. In addition, the acquisition of skills can increase student's motivation to pursue their research projects and prepare them for both academic or industrial careers.
SEApr 16, 2020
Results from a replicated experiment on the affective reactions of novice developers when applying test-driven developmentSimone Romano, Giuseppe Scanniello, Maria Teresa Baldassarre et al.
Test-driven Development (TDD) is an incremental approach to software development. Despite it is claimed to improve both quality of software and developers' productivity, the research on the claimed effects of TDD has so far shown inconclusive results. Some researchers have ascribed these inconclusive results to the negative affective states that TDD would provoke. A previous (baseline) experiment has, therefore, studied the affective reactions of (novice) developers---i.e., 29 third-year undergraduates in Computer Science (CS)---when practicing TDD to implement software. To validate the results of the baseline experiment, we conducted a replicated experiment that studies the affective reactions of novice developers when applying TDD to develop software. Developers in the treatment group carried out a development task using TDD, while those in the control group used a non-TDD approach. To measure the affective reactions of developers, we used the Self-Assessment Manikin instrument complemented with a liking dimension. The most important differences between the baseline and replicated experiments are: (i) the kind of novice developers involved in the experiments---third-year vs. second-year undergraduates in CS from two different universities; and (ii) their number---29 vs. 59. The results of the replicated experiment do not show any difference in the affective reactions of novice developers. Instead, the results of the baseline experiment suggest that developers seem to like TDD less as compared to a non-TDD approach and that developers following TDD seem to like implementing code less than the other developers, while testing code seems to make them less happy.
SEJul 29, 2019
An Empirical Assessment on Affective Reactions of Novice Developers when Applying Test-Driven DevelopmentSimone Romano, Davide Fucci, Maria Teresa Baldassarre et al.
We study whether and in which phase Test-Driven Development (TDD) influences affective states of novice developers in terms of pleasure, arousal, dominance, and liking. We performed a controlled experiment with 29 novice developers. Developers in the treatment group performed a development task using TDD, whereas those in the control group used a non-TDD development approach. We compared the affective reactions to the development approaches, as well as to the implementation and testing phases, exploiting a lightweight, powerful, and widely used tool, i.e., Self-Assessment Manikin. We observed that there is a difference between the two development approaches in terms of affective reactions. Therefore, it seems that affective reactions play an important role when applying TDD and their investigation could help researchers to better understand such a development approach
SEJun 26, 2019
Software Engineering Research Community Viewpoints on Rapid ReviewsBruno Cartaxo, Gustavo Pinto, Baldoino Fonseca et al.
Background: One of the most important current challenges of Software Engineering (SE) research is to provide relevant evidence to practice. In health related fields, Rapid Reviews (RRs) have shown to be an effective method to achieve that goal. However, little is known about how the SE research community perceives the potential applicability of RRs. Aims: The goal of this study is to understand the SE research community viewpoints towards the use of RRs as a means to provide evidence to practitioners. Method: To understand their viewpoints, we invited 37 researchers to analyze 50 opinion statements about RRs, and rate them according to what extent they agree with each statement. Q-Methodology was employed to identify the most salient viewpoints, represented by the so called factors. Results: Four factors were identified: Factor A groups undecided researchers that need more evidence before using RRs; Researchers grouped in Factor B are generally positive about RRs, but highlight the need to define minimum standards; Factor C researchers are more skeptical and reinforce the importance of high quality evidence; Researchers aligned to Factor D have a pragmatic point of view, considering RRs can be applied based on the context and constraints faced by practitioners. Conclusions: In conclusion, although there are opposing viewpoints, there are also some common grounds. For example, all viewpoints agree that both RRs and Systematic Reviews can be poorly or well conducted.
SEApr 29, 2019
How software engineering research aligns with design science: A reviewEmelie Engström, Margaret-Anne Storey, Per Runeson et al.
Background: Assessing and communicating software engineering research can be challenging. Design science is recognized as an appropriate research paradigm for applied research but is seldom referred to in software engineering. Applying the design science lens to software engineering research may improve the assessment and communication of research contributions. Aim: The aim of this study is 1) to understand whether the design science lens helps summarize and assess software engineering research contributions, and 2) to characterize different types of design science contributions in the software engineering literature. Method: In previous research, we developed a visual abstract template, summarizing the core constructs of the design science paradigm. In this study, we use this template in a review of a set of 38 top software engineering publications to extract and analyze their design science contributions. Results: We identified five clusters of papers, classifying them according to their alignment with the design science paradigm. Conclusions: The design science lens helps emphasize the theoretical contribution of research output---in terms of technological rules---and reflect on the practical relevance, novelty, and rigor of the rules proposed by the research.
SEJul 9, 2018
A Longitudinal Cohort Study on the Retainment of Test-Driven DevelopmentDavide Fucci, Simone Romano, Maria Teresa Baldassarre et al.
Background: Test-Driven Development (TDD) is an agile software development practice, which is claimed to boost both external quality of software products and developers' productivity. Aims: We want to study (i) the TDD effects on the external quality of software products as well as the developers' productivity, and (ii) the retainment of TDD over a period of five months. Method: We conducted a (quantitative) longitudinal cohort study with 30 third year undergraduate students in Computer Science at the University of Bari in Italy. Results: The use of TDD has a statistically significant effect neither on the external quality of software products nor on the developers' productivity. However, we observed that participants using TDD produced significantly more tests than those applying a non-TDD development process and that the retainment of TDD is particularly noticeable in the amount of tests written. Conclusions: Our results should encourage software companies to adopt TDD because who practices TDD tends to write more tests---having more tests can come in handy when testing software systems or localizing faults---and it seems that novice developers retain TDD.