Marcos Kalinowski

SE
h-index47
35papers
980citations
Novelty23%
AI Score49

35 Papers

96.1SEMay 10
Guidelines for Empirical Studies in Software Engineering involving Large Language Models

Sebastian Baltes, Florian Angermeir, Chetan Arora et al.

Large Language Models (LLMs) are widely used in software engineering (SE) research and practice, yet their non-determinism, opaque training data, and rapidly evolving models threaten the reproducibility and replicability of empirical studies. We address this challenge through a collaborative effort of 22 researchers, presenting a taxonomy of seven study types that organizes how LLMs are used in SE research, together with eight guidelines for designing and reporting such studies. Each guideline distinguishes requirements (must) from recommended practices (should) and is contextualized by the study types it applies to. Our guidelines recommend that researchers: (1) declare LLM usage and role; (2) report model versions, configurations, and customizations; (3) document the tool architecture beyond the model; (4) disclose prompts, their development, and interaction logs; (5) validate LLM outputs with humans; (6) include an open LLM as a baseline; (7) use suitable baselines, benchmarks, and metrics; and (8) articulate limitations and mitigations. We complement the guidelines with an applicability matrix mapping guidelines to study types and a reporting checklist for authors and reviewers. We maintain the study types and guidelines online as a living resource for the community to use and shape (llm-guidelines$.$org).

38.0SEMay 31
Understanding Undesirable Attributes of Requirements Engineers: Insights from Practitioners

Larissa Barbosa, Sávio Freire, Marcos Kalinowski et al.

Context. The characteristics of software professionals have been widely investigated in the literature. However, limited attention has been given to undesirable attributes in Requirements Engineering, despite the strong dependence of this activity on stakeholder interaction and collaboration. Objectives. This study investigates the undesirable attributes of requirements engineers' hat may hinder collaboration and project success. Method. We surveyed software practitioners to identify these attributes and conducted interviews to gather supporting evidence. Results. Seventeen undesirable attributes were identified, grouped into four categories (communication issues, lack of domain knowledge, personality, and lack of technical knowledge), and organized into conceptual maps. Conclusion. The maps help requirements engineers reflect on and improve their professional practice by recognizing traits that may hinder collaboration and project outcomes.

SEJul 20, 2023
Assessing the Use of AutoML for Data-Driven Software Engineering

Fabio Calefato, Luigi Quaranta, Filippo Lanubile et al.

Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.

SEJun 20, 2022
Towards Perspective-Based Specification of Machine Learning-Enabled Systems

Hugo Villamizar, Marcos Kalinowski, Helio Lopes

Machine learning (ML) teams often work on a project just to realize the performance of the model is not good enough. Indeed, the success of ML-enabled systems involves aligning data with business problems, translating them into ML tasks, experimenting with algorithms, evaluating models, capturing data from users, among others. Literature has shown that ML-enabled systems are rarely built based on precise specifications for such concerns, leading ML teams to become misaligned due to incorrect assumptions, which may affect the quality of such systems and overall project success. In order to help addressing this issue, this paper describes our work towards a perspective-based approach for specifying ML-enabled systems. The approach involves analyzing a set of 45 ML concerns grouped into five perspectives: objectives, user experience, infrastructure, model, and data. The main contribution of this paper is to provide two new artifacts that can be used to help specifying ML-enabled systems: (i) the perspective-based ML task and concern diagram and (ii) the perspective-based ML specification template.

14.2SEMay 26
Software Engineering Podcasts: An Empirical Study of Their Potential as a Research Resource

Marvin Wyrich, Marcos Kalinowski, Adolfo Neto et al.

Podcasts have become an increasingly popular medium for knowledge sharing within the software engineering (SE) community, offering insights into industry developments and the perspectives of professionals with different backgrounds. As this medium grows, it presents a potentially valuable resource not only for practitioners but also for researchers seeking to understand the evolving field. However, little is known about the actual content of SE podcasts or how they are perceived and used by researchers. This study systematically explores the SE podcast landscape, analyzing its content and surveying researchers to assess how podcasts can serve as a meaningful resource for advancing empirical software engineering research.

6.6SEApr 12
A Quasi-Experimental Evaluation of Coaching to Mitigate the Impostor Phenomenon in Early-Career Software Engineers

Paloma Guenes, Joan Leite, Rafael Tomaz et al.

Context: The Impostor Phenomenon (IP), the persistent belief of being a fraud despite evident competence, is common in Software Engineering (SE), where high expectations for expertise and innovation prevail. Although coaching and similar interventions are proposed to mitigate IP, empirical evidence in SE remains underexplored. Objective: This study examines the impact of a structured group coaching intervention on reducing IP feelings among early-career software engineers. Method: We conducted a quasi-experiment with 20 participants distributed across two project teams using a wait-list control design, complemented by non-participant observation. The treatment group received a three-session coaching intervention, while the control group received it after an observation phase. IP was assessed using the Clance Impostor Phenomenon Scale (CIPS), alongside evaluated measures of well-being (WHO-5), life satisfaction (SWLS), and affect (PANAS). Results: The coaching resulted in modest reductions in CIPS scores, whereas the control group also improved during the observation phase, suggesting that contextual and temporal factors may have exerted a stronger influence than the formal intervention. Conclusion: These results suggest that coaching may support reflection and awareness related to IP, yet other contextual aspects of team collaboration and project work might also contribute to these changes. This study offers a novel empirical step toward understanding how structured IP interventions operate within SE environments.

68.4SEApr 13
Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape

Bianca Trinkenreich, Fabio Calefato, Kelly Blincoe et al.

Context: Software engineering (SE) researchers increasingly study Generative AI (GenAI) while also incorporating it into their own research practices. Despite rapid adoption, there is limited empirical evidence on how GenAI is used in SE research and its implications for research practices and governance. Aims: We conduct a large-scale survey of 457 SE researchers publishing in top venues between 2023 and 2025. Method: Using quantitative and qualitative analyses, we examine who uses GenAI and why, where it is used across research activities, and how researchers perceive its benefits, opportunities, challenges, risks, and governance. Results: GenAI use is widespread, with many researchers reporting pressure to adopt and align their work with it. Usage is concentrated in writing and early-stage activities, while methodological and analytical tasks remain largely human-driven. Although productivity gains are widely perceived, concerns about trust, correctness, and regulatory uncertainty persist. Researchers highlight risks such as inaccuracies and bias, emphasize mitigation through human oversight and verification, and call for clearer governance, including guidance on responsible use and peer review. Conclusion: We provide a fine-grained, SE-specific characterization of GenAI use across research activities, along with taxonomies of GenAI use cases for research and peer review, opportunities, risks, mitigation strategies, and governance needs. These findings establish an empirical baseline for the responsible integration of GenAI into academic practice.

31.7SEApr 1
An Empirical Study of Generative AI Adoption in Software Engineering

Görkem Giray, Onur Demirörs, Marcos Kalinowski et al.

Context. GenAI tools are being increasingly adopted by practitioners in SE, promising support for several SE activities. Despite increasing adoption, we still lack empirical evidence on how GenAI is used in practice, the benefits it provides, the challenges it introduces, and its broader organizational and societal implications. Objective. This study aims to provide an overview of the status of GenAI adoption in SE. It investigates the status of GenAI adoption, associated benefits and challenges, institutionalization of tools and techniques, and anticipated long term impacts on SE professionals and the community. Results. The results indicate a wide adoption of GenAI tools and how they are deeply integrated into daily SE work, particularly for implementation, verification and validation, personal assistance, and maintenance-related tasks. Practitioners report substantial benefits, most notably reduction in cycle time, quality improvements, enhanced support in knowledge work, and productivity gains. However, objective measurement of productivity and quality remains limited in practice. Significant challenges persist, including incorrect or unreliable outputs, prompt engineering difficulties, validation overhead, security and privacy concerns, and risks of overreliance. Institutionalization of tools and techniques seems to be common, but it varies considerably, with a strong focus on tool access and less emphasis on training and governance. Practitioners expect GenAI to redefine rather than replace their roles, while expressing moderate concern about job market contraction and skill shifts.

LGAug 28, 2022
Predicting IMDb Rating of TV Series with Deep Learning: The Case of Arrow

Anna Luiza Gomes, Getúlio Vianna, Tatiana Escovedo et al.

Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55.

66.7SEMay 12
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar

Davide Taibi, Henry Muccini, Karthik Vaidhyanathan et al.

The rise of agentic AI is reshaping software engineering in two intertwined directions: agents are increasingly applied to support software engineering tasks, and Agentic AI systems themselves are complex systems that require re-thinking currently established software engineering practices. To chart a coherent research agenda covering the two directions, we organized the A2SE seminar in Rio de Janeiro, bringing together 18 experts from academia and industry. Through structured presentations, collaborative topic clustering, and focused group discussions, participants identified six thematic areas: Governance, Software Engineering for Agents, Agents for Software Architecture, Quality and Evaluation, Sustainability, and Code, and they prioritized short-term and long-term research directions for each. This paper presents the resulting community-driven, opinionated research agenda, offering the SE community a structured foundation for coordinating efforts at this critical juncture.

SESep 9, 2021Code
Cataloging Dependency Injection Anti-Patterns in Software Systems

Rodrigo Laigner, Diogo Mendonça, Alessandro Garcia et al.

Context: Dependency Injection (DI) is a commonly applied mechanism to decouple classes from their dependencies in order to provide higher modularization. However, bad DI practices often lead to negative consequences, such as increasing coupling. Although white literature conjectures about the existence of DI anti-patterns, there is no evidence on their practical relevance, usefulness, and generality. Objective: The objective of this study is to propose and evaluate a catalog of Java DI anti-patterns and associated refactorings. Methodology: We reviewed existing reported DI anti-patterns in order to analyze their completeness. The limitations found in literature motivated proposing a novel catalog of 12 DI anti-patterns. We developed a tool to statically analyze the occurrence level of the candidate DI anti-patterns in both open-source and industry projects. Next, we survey practitioners to assess their perception on the relevance, usefulness, and their willingness on refactoring anti-pattern instances of the catalog. Results: Our static code analyzer tool showed a relative recall of 92.19% and high average precision. It revealed that at least 9 different DI anti-patterns appeared frequently in the analyzed projects. Besides, our survey confirmed the perceived relevance of the catalog and developers expressed their willingness to refactor instances of anti-patterns from source code. Conclusion: The catalog contains Java DI anti-patterns that occur in practice and that are perceived as useful. Sharing it with practitioners may help them to avoid such anti-patterns, thus improving source-code quality.

DBFeb 27, 2021Code
Data Management in Microservices: State of the Practice, Challenges, and Research Directions

Rodrigo Laigner, Yongluan Zhou, Marcos Antonio Vaz Salles et al.

Microservices have become a popular architectural style for data-driven applications, given their ability to functionally decompose an application into small and autonomous services to achieve scalability, strong isolation, and specialization of database systems to the workloads and data formats of each service. Despite the accelerating industrial adoption of this architectural style, an investigation of the state of the practice and challenges practitioners face regarding data management in microservices is lacking. To bridge this gap, we conducted a systematic literature review of representative articles reporting the adoption of microservices, we analyzed a set of popular open-source microservice applications, and we conducted an online survey to cross-validate the findings of the previous steps with the perceptions and experiences of over 120 experienced practitioners and researchers. Through this process, we were able to categorize the state of practice of data management in microservices and observe several foundational challenges that cannot be solved by software engineering practices alone, but rather require system-level support to alleviate the burden imposed on practitioners. We discuss the shortcomings of state-of-the-art database systems regarding microservices and we conclude by devising a set of features for microservice-oriented database systems.

0.7SEMar 20
Teaching Practically Relevant Research Problem Formulation in Software Engineering with Lean Research Inception

Anrafel Fernandes Pereira, Tatiane Ornelas, Allysson Allex Araujo et al.

[Background] Well-formulated Software Engineering (SE) research problems are essential for bridging the gap between industry-academia. Lean Research Inception (LRI) aims to support this activity. [Goal] Apply LRI to support SE students in formulating practice-aligned research problems. [Method] We conducted a case study with 60 students and 7 faculty advisors of a Brazilian university. [Results] Students reported benefits in reasoning (60%), clarity and definition (61.7%), contextualization (60%), and communication (50%). Advisors also observed clearer and more structured problems (57.1%) with a high recommendation rate (85.7%). [Conclusion] LRI can be a promising approach to support practice-aligned research problem formulation in SE education.

SEJun 15, 2025
Get on the Train or be Left on the Station: Using LLMs for Software Engineering Research

Bianca Trinkenreich, Fabio Calefato, Geir Hanssen et al.

The adoption of Large Language Models (LLMs) is not only transforming software engineering (SE) practice but is also poised to fundamentally disrupt how research is conducted in the field. While perspectives on this transformation range from viewing LLMs as mere productivity tools to considering them revolutionary forces, we argue that the SE research community must proactively engage with and shape the integration of LLMs into research practices, emphasizing human agency in this transformation. As LLMs rapidly become integral to SE research - both as tools that support investigations and as subjects of study - a human-centric perspective is essential. Ensuring human oversight and interpretability is necessary for upholding scientific rigor, fostering ethical responsibility, and driving advancements in the field. Drawing from discussions at the 2nd Copenhagen Symposium on Human-Centered AI in SE, this position paper employs McLuhan's Tetrad of Media Laws to analyze the impact of LLMs on SE research. Through this theoretical lens, we examine how LLMs enhance research capabilities through accelerated ideation and automated processes, make some traditional research practices obsolete, retrieve valuable aspects of historical research approaches, and risk reversal effects when taken to extremes. Our analysis reveals opportunities for innovation and potential pitfalls that require careful consideration. We conclude with a call to action for the SE research community to proactively harness the benefits of LLMs while developing frameworks and guidelines to mitigate their risks, to ensure continued rigor and impact of research in an AI-augmented future.

CYMay 15, 2024
Trustworthy AI in practice: an analysis of practitioners' needs and challenges

Maria Teresa Baldassarre, Domenico Gigante, Marcos Kalinowski et al.

Recently, there has been growing attention on behalf of both academic and practice communities towards the ability of Artificial Intelligence (AI) systems to operate responsibly and ethically. As a result, a plethora of frameworks and guidelines have appeared to support practitioners in implementing Trustworthy AI applications (TAI). However, little research has been done to investigate whether such frameworks are being used and how. In this work, we study the vision AI practitioners have on TAI principles, how they address them, and what they would like to have - in terms of tools, knowledge, or guidelines - when they attempt to incorporate such principles into the systems they develop. Through a survey and semi-structured interviews, we systematically investigated practitioners' challenges and needs in developing TAI systems. Based on these practical findings, we highlight recommendations to help AI practitioners develop Trustworthy AI applications.

SEMay 20, 2024
Naming the Pain in Machine Learning-Enabled Systems Engineering

Marcos Kalinowski, Daniel Mendez, Görkem Giray et al.

Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.

CLNov 18, 2024
Large Language Model for Qualitative Research -- A Systematic Mapping Study

Cauã Ferreira Barros, Bruna Borges Azevedo, Valdemar Vicente Graciano Neto et al.

The exponential growth of text-based data in domains such as healthcare, education, and social sciences has outpaced the capacity of traditional qualitative analysis methods, which are time-intensive and prone to subjectivity. Large Language Models (LLMs), powered by advanced generative AI, have emerged as transformative tools capable of automating and enhancing qualitative analysis. This study systematically maps the literature on the use of LLMs for qualitative research, exploring their application contexts, configurations, methodologies, and evaluation metrics. Findings reveal that LLMs are utilized across diverse fields, demonstrating the potential to automate processes traditionally requiring extensive human input. However, challenges such as reliance on prompt engineering, occasional inaccuracies, and contextual limitations remain significant barriers. This research highlights opportunities for integrating LLMs with human expertise, improving model robustness, and refining evaluation methodologies. By synthesizing trends and identifying research gaps, this study aims to guide future innovations in the application of LLMs for qualitative analysis.

SEJun 25, 2025
Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Silvio Alonso, Antonio Pedro Santos Alves, Lucas Romao et al.

[Context] The increasing adoption of machine learning (ML) in software systems demands specialized ideation approaches that address ML-specific challenges, including data dependencies, technical feasibility, and alignment between business objectives and probabilistic system behavior. Traditional ideation methods like Lean Inception lack structured support for these ML considerations, which can result in misaligned product visions and unrealistic expectations. [Goal] This paper presents Define-ML, a framework that extends Lean Inception with tailored activities - Data Source Mapping, Feature-to-Data Source Mapping, and ML Mapping - to systematically integrate data and technical constraints into early-stage ML product ideation. [Method] We developed and validated Define-ML following the Technology Transfer Model, conducting both static validation (with a toy problem) and dynamic validation (in a real-world industrial case study). The analysis combined quantitative surveys with qualitative feedback, assessing utility, ease of use, and intent of adoption. [Results] Participants found Define-ML effective for clarifying data concerns, aligning ML capabilities with business goals, and fostering cross-functional collaboration. The approach's structured activities reduced ideation ambiguity, though some noted a learning curve for ML-specific components, which can be mitigated by expert facilitation. All participants expressed the intention to adopt Define-ML. [Conclusion] Define-ML provides an openly available, validated approach for ML product ideation, building on Lean Inception's agility while aligning features with available data and increasing awareness of technical feasibility.

HCOct 29, 2025
User Misconceptions of LLM-Based Conversational Programming Assistants

Gabrielle O'Brien, Antonio Pedro Santos Alves, Sebastian Baltes et al.

Programming assistants powered by large language models (LLMs) have become widely available, with conversational assistants like ChatGPT proving particularly accessible to less experienced programmers. However, the varied capabilities of these tools across model versions and the mixed availability of extensions that enable web search, code execution, or retrieval-augmented generation create opportunities for user misconceptions about what systems can and cannot do. Such misconceptions may lead to over-reliance, unproductive practices, or insufficient quality control in LLM-assisted programming. Here, we aim to characterize misconceptions that users of conversational LLM-based assistants may have in programming contexts. Using a two-phase approach, we first brainstorm and catalog user misconceptions that may occur, and then conduct a qualitative analysis to examine whether these conceptual issues surface in naturalistic Python-programming conversations with an LLM-based chatbot drawn from an openly available dataset. Indeed, we see evidence that some users have misplaced expectations about the availability of LLM-based chatbot features like web access, code execution, or non-text output generation. We also see potential evidence for deeper conceptual issues around the scope of information required to debug, validate, and optimize programs. Our findings reinforce the need for designing LLM-based tools that more clearly communicate their programming capabilities to users.

SEJun 25, 2025
Agile Management for Machine Learning: A Systematic Mapping Study

Lucas Romao, Hugo Villamizar, Romeu Oliveira et al.

[Context] Machine learning (ML)-enabled systems are present in our society, driving significant digital transformations. The dynamic nature of ML development, characterized by experimental cycles and rapid changes in data, poses challenges to traditional project management. Agile methods, with their flexibility and incremental delivery, seem well-suited to address this dynamism. However, it is unclear how to effectively apply these methods in the context of ML-enabled systems, where challenges require tailored approaches. [Goal] Our goal is to outline the state of the art in agile management for ML-enabled systems. [Method] We conducted a systematic mapping study using a hybrid search strategy that combines database searches with backward and forward snowballing iterations. [Results] Our study identified 27 papers published between 2008 and 2024. From these, we identified eight frameworks and categorized recommendations and practices into eight key themes, such as Iteration Flexibility, Innovative ML-specific Artifacts, and the Minimal Viable Model. The main challenge identified across studies was accurate effort estimation for ML-related tasks. [Conclusion] This study contributes by mapping the state of the art and identifying open gaps in the field. While relevant work exists, more robust empirical evaluation is still needed to validate these contributions.

SESep 23, 2021
What Makes Agile Software Development Agile?

Marco Kuhrmann, Paolo Tell, Regina Hebig et al.

Together with many success stories, promises such as the increase in production speed and the improvement in stakeholders' collaboration have contributed to making agile a transformation in the software industry in which many companies want to take part. However, driven either by a natural and expected evolution or by contextual factors that challenge the adoption of agile methods as prescribed by their creator(s), software processes in practice mutate into hybrids over time. Are these still agile? In this article, we investigate the question: what makes a software development method agile? We present an empirical study grounded in a large-scale international survey that aims to identify software development methods and practices that improve or tame agility. Based on 556 data points, we analyze the perceived degree of agility in the implementation of standard project disciplines and its relation to used development methods and practices. Our findings suggest that only a small number of participants operate their projects in a purely traditional or agile manner (under 15%). That said, most project disciplines and most practices show a clear trend towards increasing degrees of agility. Compared to the methods used to develop software, the selection of practices has a stronger effect on the degree of agility of a given discipline. Finally, there are no methods or practices that explicitly guarantee or prevent agility. We conclude that agility cannot be defined solely at the process level. Additional factors need to be taken into account when trying to implement or improve agility in a software company. Finally, we discuss the field of software process-related research in the light of our findings and present a roadmap for future research.

SEAug 26, 2021
On Psychometric Instruments in Software Engineering Research: An Ongoing Study

Danilo Almeida Felipe, Marcos Kalinowski

[Context] Although software development is an inherently human activity, research in software engineering (SE) has long focused mostly on processes and tools, failing to recall about the human factors behind. Even when explored, researchers typically do not properly use psychology background to better understand human factors in SE, such as the psychometric instruments, which aim to measure human factors. [Objective] Our goal is to provide a critical review on the use of psychometric instruments in SE research regarding personality. [Method] We present a two-step study. First, a systematic mapping of the literature in order to generate a catalog of the psychometric instruments used; second, a preliminary survey to be conducted with social sciences researchers to assess their adoption in SE research. [Results and Conclusion] The results so far are quite initial. The next steps direct us to finish the data extraction to finalize the catalog (systematic mapping) and to refine the survey design and apply it with social sciences researchers.

SENov 25, 2020
An Empirical Investigation on the Challenges of Creating Custom Static Analysis Rules for Defect Localization

Diogo Silveira Mendonça, Marcos Kalinowski

Background: Custom static analysis rules, i.e., rules specific for one or more applications, have been successfully applied to perform corrective and preventive software maintenance. Pattern-Driven Maintenance (PDM) is a method designed to support the creation of such rules during software maintenance. However, as PDM was recently proposed, few maintainers have reported on its usage. Hence, the challenges and skills needed to apply PDM properly are unknown. Aims: In this paper, we investigate the challenges faced by maintainers on applying PDM for creating custom static analysis rules for defect localization. Method: We conducted an observational study on novice maintainers creating custom static analysis rules by applying PDM. The study was divided into three tasks: (i) identifying a defect pattern, (ii) programming a static analysis rule to locate instances of the pattern, and (iii) verifying the located instances. We analyzed the efficiency and acceptance of maintainers on applying PDM and their comments on task challenges. Results: We observed that previous knowledge on debugging, the subject software, and related technologies influenced the performance of maintainers as well as the time to learn the technology involved in rule programming. Conclusions: The results strengthen our confidence that PDM can help maintainers in producing custom static analysis rules for locating defects. However, a proper selection and training of maintainers is needed to apply PDM effectively. Also, using a higher level of abstraction can ease static analysis rule programming for novice maintainers.

SESep 6, 2020
An Efficient Approach for Reviewing Security-Related Aspects in Agile Requirements Specifications of Web Applications

Hugo Villamizar, Marcos Kalinowski, Alessandro Garcia et al.

Defects in requirements specifications can have severe consequences during the software development lifecycle. Some of them may result in poor product quality and/or time and budget overruns due to incorrect or missing quality characteristics, such as security. This characteristic requires special attention in web applications because they have become a target for manipulating sensible data. Several concerns make security difficult to deal with. For instance, security requirements are often misunderstood and improperly specified due to lack of security expertise and emphasis on security during early stages of software development. This often leads to unspecified or ill-defined security-related aspects. These concerns become even more challenging in agile contexts, where lightweight documentation is typically produced. To tackle this problem, we designed an approach for reviewing security-related aspects in agile requirements specifications of web applications. Our proposal considers user stories and security specifications as inputs and relates those user stories to security properties via Natural Language Processing. Based on the related security properties, our approach identifies high-level security requirements from the Open Web Application Security Project (OWASP) to be verified, and generates a reading technique to support reviewers in detecting defects. We evaluate our approach via three experiment trials conducted with 56 novice software engineers, measuring effectiveness, efficiency, usefulness, and ease of use. We compare our approach against using: (1) the OWASP high-level security requirements, and (2) a perspective-based approach as proposed in contemporary state of the art. The results strengthen our confidence that using our approach has a positive impact (with large effect size) on the performance of inspectors in terms of effectiveness and efficiency.

SEJun 9, 2020
Guidelines for the Search Strategy to Update Systematic Literature Reviews in Software Engineering

Claes Wohlin, Emilia Mendes, Katia Romero Felizardo et al.

Context: Systematic Literature Reviews (SLRs) have been adopted within Software Engineering (SE) for more than a decade to provide meaningful summaries of evidence on several topics. Many of these SLRs are now potentially not fully up-to-date, and there are no standard proposals on how to update SLRs in SE. Objective: The objective of this paper is to propose guidelines on how to best search for evidence when updating SLRs in SE, and to evaluate these guidelines using an SLR that was not employed during the formulation of the guidelines. Method: To propose our guidelines, we compare and discuss outcomes from applying different search strategies to identify primary studies in a published SLR, an SLR update, and two replications in the area of effort estimation. These guidelines are then evaluated using an SLR in the area of software ecosystems, its update and a replication. Results: The use of a single iteration forward snowballing with Google Scholar, and employing as a seed set the original SLR and its primary studies is the most cost-effective way to search for new evidence when updating SLRs. Furthermore, the importance of having more than one researcher involved in the selection of papers when applying the inclusion and exclusion criteria is highlighted through the results. Conclusions: Our proposed guidelines formulated based upon an effort estimation SLR, its update and two replications, were supported when using an SLR in the area of software ecosystems, its update and a replication. Therefore, we put forward that our guidelines ought to be adopted for updating SLRs in SE.

SEMay 3, 2020
Pandemic Programming: How COVID-19 affects software developers and how their organizations can help

Paul Ralph, Sebastian Baltes, Gianisa Adisaputri et al.

Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Confirmatory results include: (1) the pandemic has had a negative effect on developers' wellbeing and productivity; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity. Exploratory analysis suggests that: (1) women, parents and people with disabilities may be disproportionately affected; (2) different people need different kinds of support. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.

DLApr 21, 2020
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering

Erica Mourão, João Felipe Pimentel, Leonardo Murta et al.

Context: When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may increase the overall review effort. Objective: The goal of this research is to propose and evaluate hybrid search strategies that selectively combine database searches with snowballing. Method: We propose four hybrid search strategies combining database searches in digital libraries with iterative, parallel, or sequential backward and forward snowballing. We simulated the strategies over three existing SLRs in SE that adopted both database searches and snowballing. We compared the outcome of digital library searches, snowballing, and hybrid strategies using precision, recall, and F-measure to investigate the performance of each strategy. Results: Our results show that, for the analyzed SLRs, combining database searches from the Scopus digital library with parallel or sequential snowballing achieved the most appropriate balance of precision and recall. Conclusion: We put forward that, depending on the goals of the SLR and the available resources, using a hybrid search strategy involving a representative digital library and parallel or sequential snowballing tends to represent an appropriate alternative to be used when searching for evidence in SLRs.

SEApr 13, 2020
When to Update Systematic Literature Reviews in Software Engineering

Emilia Mendes, Claes Wohlin, Katia Felizardo et al.

[Context] Systematic Literature Reviews (SLRs) have been adopted by the Software Engineering (SE) community for approximately 15 years to provide meaningful summaries of evidence on several topics. Many of these SLRs are now potentially outdated, and there are no systematic proposals on when to update SLRs in SE. [Objective] The goal of this paper is to provide recommendations on when to update SLRs in SE. [Method] We evaluated, using a three-step approach, a third-party decision framework (3PDF) employed in other fields, to decide whether SLRs need updating. First, we conducted a literature review of SLR updates in SE and contacted the authors to obtain their feedback relating to the usefulness of the 3PDF within the context of SLR updates in SE. Second, we used these authors feedback to see whether the framework needed any adaptation; none was suggested. Third, we applied the 3PDF to the SLR updates identified in our literature review. [Results] The 3PDF showed that 14 of the 20 SLRs did not need updating. This supports the use of a decision support mechanism (such as the 3PDF) to help the SE community decide when to update SLRs. [Conclusions] We put forward that the 3PDF should be adopted by the SE community to keep relevant evidence up to date and to avoid wasting effort with unnecessary updates.

SEAug 16, 2019
Challenges in Survey Research

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer et al.

While being an important and often used research method, survey research has been less often discussed on a methodological level in empirical software engineering than other types of research. This chapter compiles a set of important and challenging issues in survey research based on experiences with several large-scale international surveys. The chapter covers theory building, sampling, invitation and follow-up, statistical as well as qualitative analysis of survey data and the usage of psychometrics in software engineering surveys.

SEFeb 1, 2019
Do We Preach What We Practice? Investigating the Practical Relevance of Requirements Engineering Syllabi - The IREB Case

Daniel Méndez Fernández, Xavier Franch, Norbert Seyff et al.

Nowadays, there exist a plethora of different educational syllabi for Requirements Engineering (RE), all aiming at incorporating practically relevant educational units (EUs). Many of these syllabi are based, in one way or the other, on the syllabi provided by the International Requirements Engineering Board (IREB), a non-profit organisation devoted to standardised certification programs for RE. IREB syllabi are developed by RE experts and are, thus, based on the assumption that they address topics of practical relevance. However, little is known about to what extent practitioners actually perceive those contents as useful. We have started a study to investigate the relevance of the EUs included in the IREB Foundation Level certification programme. In a first phase reported in this paper, we have surveyed practitioners mainly from DACH countries (Germany, Austria and Switzerland) participating in the IREB certification. Later phases will widen the scope both by including other countries and by not requiring IREB-certified participants. The results shall foster a critical reflection on the practical relevance of EUs built upon the de-facto standard syllabus of IREB.

SEMay 21, 2018
Status Quo in Requirements Engineering: A Theory and a Global Family of Surveys

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer et al.

Requirements Engineering (RE) has established itself as a software engineering discipline during the past decades. While researchers have been investigating the RE discipline with a plethora of empirical studies, attempts to systematically derive an empirically-based theory in context of the RE discipline have just recently been started. However, such a theory is needed if we are to define and motivate guidance in performing high quality RE research and practice. We aim at providing an empirical and valid foundation for a theory of RE, which helps software engineers establish effective and efficient RE processes. We designed a survey instrument and theory that has now been replicated in 10 countries world-wide. We evaluate the propositions of the theory with bootstrapped confidence intervals and derive potential explanations for the propositions. We report on the underlying theory and the full results obtained from the replication studies with participants from 228 organisations. Our results represent a substantial step forward towards developing an empirically-based theory of RE giving insights into current practices with RE processes. The results reveal, for example, that there are no strong differences between organisations in different countries and regions, that interviews, facilitated meetings and prototyping are the most used elicitation techniques, that requirements are often documented textually, that traces between requirements and code or design documents is common, requirements specifications themselves are rarely changed and that requirements engineering (process) improvement endeavours are mostly intrinsically motivated. Our study establishes a theory that can be used as starting point for many further studies for more detailed investigations. Practitioners can use the results as theory-supported guidance on selecting suitable RE methods and techniques.

SEJul 1, 2017
On Evidence-based Risk Management in Requirements Engineering

Daniel Méndez Fernández, Michaela Tießler, Marcos Kalinowski et al.

Background: The sensitivity of Requirements Engineering (RE) to the context makes it difficult to efficiently control problems therein, thus, hampering an effective risk management devoted to allow for early corrective or even preventive measures. Problem: There is still little empirical knowledge about context-specific RE phenomena which would be necessary for an effective context- sensitive risk management in RE. Goal: We propose and validate an evidence-based approach to assess risks in RE using cross-company data about problems, causes and effects. Research Method: We use survey data from 228 companies and build a probabilistic network that supports the forecast of context-specific RE phenomena. We implement this approach using spreadsheets to support a light-weight risk assessment. Results: Our results from an initial validation in 6 companies strengthen our confidence that the approach increases the awareness for individual risk factors in RE, and the feedback further allows for disseminating our approach into practice.

SEMar 24, 2017
Requirements Engineering Practice and Problems in Agile Projects: Results from an International Survey

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer et al.

Requirements engineering (RE) is considerably different in agile development than in more traditional development processes. Yet, there is little empirical knowledge on the state of the practice and contemporary problems in agile RE. As part of a bigger survey initiative (Naming the Pain in Requirements Engineering), we build an empirical basis on such aspects of agile RE. Based on the responses of representatives from 92 different organisations, we found that agile RE concentrates on free-text documentation of requirements elicited with a variety of techniques. Often, traces between requirements and code are explicitly managed and also software testing and RE are aligned. Furthermore, continuous improvement of RE is performed due to intrinsic motivation. Important experienced problems include unclear requirements and communication flaws. Overall, we found that most organisations conduct RE in a way we would expect and that agile RE is in several aspects not so different from RE in other development processes.

SEFeb 13, 2017
Supporting Defect Causal Analysis in Practice with Cross-Company Data on Causes of Requirements Engineering Problems

Marcos Kalinowski, Pablo Curty, Aline Paes et al.

[Context] Defect Causal Analysis (DCA) represents an efficient practice to improve software processes. While knowledge on cause-effect relations is helpful to support DCA, collecting cause-effect data may require significant effort and time. [Goal] We propose and evaluate a new DCA approach that uses cross-company data to support the practical application of DCA. [Method] We collected cross-company data on causes of requirements engineering problems from 74 Brazilian organizations and built a Bayesian network. Our DCA approach uses the diagnostic inference of the Bayesian network to support DCA sessions. We evaluated our approach by applying a model for technology transfer to industry and conducted three consecutive evaluations: (i) in academia, (ii) with industry representatives of the Fraunhofer Project Center at UFBA, and (iii) in an industrial case study at the Brazilian National Development Bank (BNDES). [Results] We received positive feedback in all three evaluations and the cross-company data was considered helpful for determining main causes. [Conclusions] Our results strengthen our confidence in that supporting DCA with cross-company data is promising and should be further investigated.

SENov 28, 2016
Naming the Pain in Requirements Engineering: Comparing Practices in Brazil and Germany

Daniel Méndez Fernández, Stefan Wagner, Marcos Kalinowski et al.

As part of the Naming the Pain in Requirements Engineering (NaPiRE) initiative, researchers compared problems that companies in Brazil and Germany encountered during requirements engineering (RE). The key takeaway was that in RE, human interaction is necessary for eliciting and specifying high-quality requirements, regardless of country, project type, or company size.