56.3SEJun 2
Analyzing the Evolution of Structural Communities within Microservice ArchitectureAlexander Bakhtin, Matteo Esposito, Valentina Lenarduzzi et al.
In recent years, the detection of anti-patterns in microservice architecture has gained traction, particularly to identify instances of Microservice Architectural Degradation. In such tasks, the microservice architecture is often modeled as a network of microservice dependencies. Recent works have explored how to assess the evolution of such architectural networks by considering the architecture of consecutive releases of the project. Particular anti-patterns related to the structure of the service network include Wrong cuts and Knot services. Community detection is a way to identify groups of services in a network that strongly depend on each other. If such groups cannot be mapped to business processes in the system, or if the same service belongs to multiple communities, this could indicate architectural degradation due to an inappropriate division of responsibilities or unoptimized communication. Temporal community detection methods have been proposed to analyze community structure that evolves in time. We performed temporal community detection within the microservice architecture of six releases of the train-ticket benchmark and analyzed the composition of the discovered communities and their activities over time. We observed a stable architecture with a clear separation of services into two communities, which we could identify with two business processes performed by the system. We found services belonging to several communities, as well as services within the same community with both incoming and outgoing connections. The membership strength metric provided by the leveraged algorithm enables fine-grained assessment of the microservice communities.
SEJan 5Code
The Invisible Hand of AI Libraries Shaping Open Source Projects and CommunitiesMatteo Esposito, Andrea Janes, Valentina Lenarduzzi et al.
In the early 1980s, Open Source Software emerged as a revolutionary concept amidst the dominance of proprietary software. What began as a revolutionary idea has now become the cornerstone of computer science. Amidst OSS projects, AI is increasing its presence and relevance. However, despite the growing popularity of AI, its adoption and impacts on OSS projects remain underexplored. We aim to assess the adoption of AI libraries in Python and Java OSS projects and examine how they shape development, including the technical ecosystem and community engagement. To this end, we will perform a large-scale analysis on 157.7k potential OSS repositories, employing repository metrics and software metrics to compare projects adopting AI libraries against those that do not. We expect to identify measurable differences in development activity, community engagement, and code complexity between OSS projects that adopt AI libraries and those that do not, offering evidence-based insights into how AI integration reshapes software development practices.
SENov 14, 2025Code
SQuaD: The Software Quality DatasetMikel Robredo, Matteo Esposito, Davide Taibi et al.
Software quality research increasingly relies on large-scale datasets that measure both the product and process aspects of software systems. However, existing resources often focus on limited dimensions, such as code smells, technical debt, or refactoring activity, thereby restricting comprehensive analyses across time and quality dimensions. To address this gap, we present the Software Quality Dataset (SQuaD), a multi-dimensional, time-aware collection of software quality metrics extracted from 450 mature open-source projects across diverse ecosystems, including Apache, Mozilla, FFmpeg, and the Linux kernel. By integrating nine state-of-the-art static analysis tools, i.e., SonarQube, CodeScene, PMD, Understand, CK, JaSoMe, RefactoringMiner, RefactoringMiner++, and PyRef, our dataset unifies over 700 unique metrics at method, class, file, and project levels. Covering a total of 63,586 analyzed project releases, SQuaD also provides version control and issue-tracking histories, software vulnerability data (CVE/CWE), and process metrics proven to enhance Just-In-Time (JIT) defect prediction. The SQuaD enables empirical research on maintainability, technical debt, software evolution, and quality assessment at unprecedented scale. We also outline emerging research directions, including automated dataset updates and cross-project quality modeling to support the continuous evolution of software analytics. The dataset is publicly available on ZENODO (DOI: 10.5281/zenodo.17566690).
HCJul 4, 2023
Learning to Prompt in the Classroom to Understand AI Limits: A pilot studyEmily Theophilou, Cansu Koyuturk, Mona Yavari et al.
Artificial intelligence's (AI) progress holds great promise in tackling pressing societal concerns such as health and climate. Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. However, the ensuing excitement has led to negative sentiments, even as AI methods demonstrate remarkable contributions (e.g. in health and genetics). A key factor contributing to this sentiment is the misleading perception that LLMs can effortlessly provide solutions across domains, ignoring their limitations such as hallucinations and reasoning constraints. Acknowledging AI fallibility is crucial to address the impact of dogmatic overconfidence in possibly erroneous suggestions generated by LLMs. At the same time, it can reduce fear and other negative attitudes toward AI. This necessitates comprehensive AI literacy interventions that educate the public about LLM constraints and effective usage techniques, i.e prompting strategies. With this aim, a pilot educational intervention was performed in a high school with 21 students. It involved presenting high-level concepts about intelligence, AI, and LLMs, followed by practical exercises involving ChatGPT in creating natural educational conversations and applying established prompting strategies. Encouraging preliminary results emerged, including high appreciation of the activity, improved interaction quality with the LLM, reduced negative AI sentiments, and a better grasp of limitations, specifically unreliability, limited understanding of commands leading to unsatisfactory responses, and limited presentation flexibility. Our aim is to explore AI acceptance factors and refine this approach for more controlled future studies.
SEJul 8, 2024
6GSoft: Software for Edge-to-Cloud ContinuumMuhammad Azeem Akbar, Matteo Esposito, Sami Hyrynsalmi et al.
In the era of 6G, developing and managing software requires cutting-edge software engineering (SE) theories and practices tailored for such complexity across a vast number of connected edge devices. Our project aims to lead the development of sustainable methods and energy-efficient orchestration models specifically for edge environments, enhancing architectural support driven by AI for contemporary edge-to-cloud continuum computing. This initiative seeks to position Finland at the forefront of the 6G landscape, focusing on sophisticated edge orchestration and robust software architectures to optimize the performance and scalability of edge networks. Collaborating with leading Finnish universities and companies, the project emphasizes deep industry-academia collaboration and international expertise to address critical challenges in edge orchestration and software architecture, aiming to drive significant advancements in software productivity and market impact.
66.9SEMay 12
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE SeminarDavide Taibi, Henry Muccini, Karthik Vaidhyanathan et al.
The rise of agentic AI is reshaping software engineering in two intertwined directions: agents are increasingly applied to support software engineering tasks, and Agentic AI systems themselves are complex systems that require re-thinking currently established software engineering practices. To chart a coherent research agenda covering the two directions, we organized the A2SE seminar in Rio de Janeiro, bringing together 18 experts from academia and industry. Through structured presentations, collaborative topic clustering, and focused group discussions, participants identified six thematic areas: Governance, Software Engineering for Agents, Agents for Software Architecture, Quality and Evaluation, Sustainability, and Code, and they prioritized short-term and long-term research directions for each. This paper presents the resulting community-driven, opinionated research agenda, offering the SE community a structured foundation for coordinating efforts at this critical juncture.
SEMar 8, 2021Code
Structural Coupling for MicroservicesSebastiano Panichella, Mohammad Imranur Rahman, Davide Taibi
Cloud-native Applications are 'distributed, elastic and horizontal-scalable systems composed of (micro)services which isolate states in a minimum of stateful components'. Hence, an important property is to ensure a low coupling and a high cohesion among the (micro)services composing the cloud-native application. Loosely coupled and highly cohesive services allow development teams to work in parallel, reducing the communication overhead between teams. However, despite both practitioners and researchers agree on the importance of this general property, there are no validated metrics to effectively measure or test the actual coupling level between services. In this work, we propose ways to compute and visualize the coupling between microservices, by extending and adapting the concepts behind the computation of the traditional structural coupling. We validate these measures with a case study involving 17 open-source projects and we provide an automatic approach to measure them. The results of this study highlight how these metrics provide to practitioners a quantitative and visual view of services compositions, which can be useful to conceive advanced systems to monitor the evolution of the service.
SEFeb 19, 2021Code
Exploring Factors and Metrics to Select Open Source Software Components for Integration: An Empirical StudyXiaozhou Li, Sergio Moreschini, Zheying Zhang et al.
[Context] Open Source Software (OSS) is nowadays used and integrated in most of the commercial products. However, the selection of OSS projects for integration is not a simple process, mainly due to a of lack of clear selection models and lack of information from the OSS portals. [Objective] We investigate the factors and metrics that practitioners currently consider when selecting OSS. We also investigate the source of information and portals that can be used to assess the factors, as well as the possibility to automatically extract such information with APIs. [Method] We elicited the factors and the metrics adopted to assess and compare OSS performing a survey among 23 experienced developers who often integrate OSS in the software they develop. Moreover, we investigated the APIs of the portals adopted to assess OSS extracting information for the most starred 100K projects in GitHub. [Result] We identified a set consisting of 8 main factors and 74 sub-factors, together with 170 related metrics that companies can use to select OSS to be integrated in their software projects. Unexpectedly, only a small part of the factors can be evaluated automatically, and out of 170 metrics, only 40 are available, of which only 22 returned information for all the 100K projects. Therefore, we recommend project maintainers and project repositories to pay attention to provide information for the project they are hosting, so as to increase the likelihood of being adopted [Conclusion] OSS selection can be partially automated, by extracting the information needed for the selection from portal APIs. OSS producers can benefit from our results by checking if they are providing all the information commonly required by potential adopters...
SESep 7, 2019Code
A curated Dataset of Microservices-Based SystemsMohammad Imranur, Rahman, Sebastiano Panichella et al.
Microservices based architectures are based on a set of modular, independent and fault-tolerant services. In recent years, the software engineering community presented studies investigating potential, recurrent, effective architectural patterns in microservices-based architectures, as they are very essential to maintain and scale microservice-based systems. Indeed, the organizational structure of such systems should be reflected in so-called microservice architecture patterns, that best fit the projects and development teams needs. However, there is a lack of public repositories sharing open sources projects microservices patterns and practices, which could be beneficial for teaching purposes and future research investigations. This paper tries to fill this gap, by sharing a dataset, having a first curated list microservice-based projects. Specifically, the dataset is composed of 20 open-source projects, all using specific microservice architecture patterns. Moreover, the dataset also reports information about inter-service calls or dependencies of the aforementioned projects. For the analysis, we used two different tools (1) SLOCcount and (2) MicroDepGraph to get different parameters for the microservice dataset. Both the microservice dataset and analysis tool are publicly available online. We believe that this dataset will be highly used by the research community for understanding more about microservices architectural and dependencies patterns, enabling researchers to compare results on common projects.
SEAug 25, 2019Code
Does Code Quality Affect Pull Request Acceptance? An empirical studyValentina Lenarduzzi, Vili Nikkola, Nyyti Saarimäki et al.
Background. Pull requests are a common practice for contributing and reviewing contributions, and are employed both in open-source and industrial contexts. One of the main goals of code reviews is to find defects in the code, allowing project maintainers to easily integrate external contributions into a project and discuss the code contributions. Objective. The goal of this paper is to understand whether code quality is actually considered when pull requests are accepted. Specifically, we aim at understanding whether code quality issues such as code smells, antipatterns, and coding style violations in the pull request code affect the chance of its acceptance when reviewed by a maintainer of the project. Method. We conducted a case study among 28 Java open-source projects, analyzing the presence of 4.7 M code quality issues in 36 K pull requests. We analyzed further correlations by applying Logistic Regression and seven machine learning techniques (Decision Tree, Random Forest, Extremely Randomized Trees, AdaBoost, Gradient Boosting, XGBoost). Results. Unexpectedly, code quality turned out not to affect the acceptance of a pull request at all. As suggested by other works, other factors such as the reputation of the maintainer and the importance of the feature delivered might be more important than code quality in terms of pull request acceptance. Conclusions. Researchers already investigated the influence of the developers' reputation and the pull request acceptance. This is the first work investigating if quality of the code in pull requests affects the acceptance of the pull request or not. We recommend that researchers further investigate this topic to understand if different measures or different tools could provide some useful measures.
SEAug 5, 2019Code
On the Relationship Between Coupling and Refactoring: An Empirical ViewpointSteve Counsell, Mahir Arzoky, Giuseppe Destefanis et al.
[Background] Refactoring has matured over the past twenty years to become part of a developer's toolkit. However, many fundamental research questions still remain largely unexplored. [Aim] The goal of this paper is to investigate the highest and lowest quartile of refactoring-based data using two coupling metrics - the Coupling between Objects metric and the more recent Conceptual Coupling between Classes metric to answer this question. Can refactoring trends and patterns be identified based on the level of class coupling? [Method] In this paper, we analyze over six thousand refactoring operations drawn from releases of three open-source systems to address one such question. [Results] Results showed no meaningful difference in the types of refactoring applied across either lower or upper quartile of coupling for both metrics; refactorings usually associated with coupling removal were actually more numerous in the lower quartile in some cases. A lack of inheritance-related refactorings across all systems was also noted. [Conclusions] The emerging message (and a perplexing one) is that developers seem to be largely indifferent to classes with high coupling when it comes to refactoring types - they treat classes with relatively low coupling in almost the same way.
SEJun 30, 2019Code
Are SonarQube Rules Inducing Bugs?Valentina Lenarduzzi, Francesco Lomio, Heikki Huttunen et al.
Background. The popularity of tools for analyzing Technical Debt, and particularly the popularity of SonarQube, is increasing rapidly. SonarQube proposes a set of coding rules, which represent something wrong in the code that will soon be reflected in a fault or will increase maintenance effort. However, our local companies were not confident in the usefulness of the rules proposed by SonarQube and contracted us to investigate the fault-proneness of these rules. Objective. In this work we aim at understanding which SonarQube rules are actually fault-prone and to understand which machine learning models can be adopted to accurately identify fault-prone rules. Method. We designed and conducted an empirical study on 21 well-known mature open-source projects. We applied the SZZ algorithm to label the fault-inducing commits. We analyzed the fault-proneness by comparing the classification power of seven machine learning models. Result. Among the 202 rules defined for Java by SonarQube, only 25 can be considered to have relatively low fault-proneness. Moreover, violations considered as "bugs" by SonarQube were generally not fault-prone and, consequently, the fault-prediction power of the model proposed by SonarQube is extremely low. Conclusion. The rules applied by SonarQube for calculating technical debt should be thoroughly investigated and their harmfulness needs to be further confirmed. Therefore, companies should carefully consider which rules they really need to apply, especially if their goal is to reduce fault-proneness.
CRDec 16, 2024
On Large Language Models in Mission-Critical IT Governance: Are We Ready Yet?Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi et al.
Context. The security of critical infrastructure has been a pressing concern since the advent of computers and has become even more critical in today's era of cyber warfare. Protecting mission-critical systems (MCSs), essential for national security, requires swift and robust governance, yet recent events reveal the increasing difficulty of meeting these challenges. Aim. Building on prior research showcasing the potential of Generative AI (GAI), such as Large Language Models, in enhancing risk analysis, we aim to explore practitioners' views on integrating GAI into the governance of IT MCSs. Our goal is to provide actionable insights and recommendations for stakeholders, including researchers, practitioners, and policymakers. Method. We designed a survey to collect practical experiences, concerns, and expectations of practitioners who develop and implement security solutions in the context of MCSs. Conclusions and Future Works. Our findings highlight that the safe use of LLMs in MCS governance requires interdisciplinary collaboration. Researchers should focus on designing regulation-oriented models and focus on accountability; practitioners emphasize data protection and transparency, while policymakers must establish a unified AI framework with global benchmarks to ensure ethical and secure LLMs-based MCS governance.
SEMar 17, 2025
Generative AI for Software Architecture. Applications, Challenges, and Future DirectionsMatteo Esposito, Xiaozhou Li, Sergio Moreschini et al.
Context: Generative Artificial Intelligence (GenAI) is transforming much of software development, yet its application in software architecture is still in its infancy, and no prior study has systematically addressed the topic. Aim: We aim to systematically synthesize the use, rationale, contexts, usability, and future challenges of GenAI in software architecture. Method: We performed a multivocal literature review (MLR), analyzing peer-reviewed and gray literature, identifying current practices, models, adoption contexts, and reported challenges, extracting themes via open coding. Results: Our review identified significant adoption of GenAI for architectural decision support and architectural reconstruction. OpenAI GPT models are predominantly applied, and there is consistent use of techniques such as few-shot prompting and retrieved-augmented generation (RAG). GenAI has been applied mostly to initial stages of the Software Development Life Cycle (SDLC), such as Requirements-to-Architecture and Architecture-to-Code. Monolithic and microservice architectures were the dominant targets. However, rigorous testing of GenAI outputs was typically missing from the studies. Among the most frequent challenges are model precision, hallucinations, ethical aspects, privacy issues, lack of architecture-specific datasets, and the absence of sound evaluation frameworks. Conclusions: GenAI shows significant potential in software design, but several challenges remain on its path to greater adoption. Research efforts should target designing general evaluation methodologies, handling ethics and precision, increasing transparency and explainability, and promoting architecture-specific datasets and benchmarks to bridge the gap between theoretical possibilities and practical use.
HCApr 10, 2025
Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting GuidelinesCansu Koyuturk, Emily Theophilou, Sabrina Patania et al.
Large Language Models (LLMs) have transformed human-computer interaction by enabling natural language-based communication with AI-powered chatbots. These models are designed to be intuitive and user-friendly, allowing users to articulate requests with minimal effort. However, despite their accessibility, studies reveal that users often struggle with effective prompting, resulting in inefficient responses. Existing research has highlighted both the limitations of LLMs in interpreting vague or poorly structured prompts and the difficulties users face in crafting precise queries. This study investigates learner-AI interactions through an educational experiment in which participants receive structured guidance on effective prompting. We introduce and compare three types of prompting guidelines: a task-specific framework developed through a structured methodology and two baseline approaches. To assess user behavior and prompting efficacy, we analyze a dataset of 642 interactions from 107 users. Using Von NeuMidas, an extended pragmatic annotation schema for LLM interaction analysis, we categorize common prompting errors and identify recurring behavioral patterns. We then evaluate the impact of different guidelines by examining changes in user behavior, adherence to prompting strategies, and the overall quality of AI-generated responses. Our findings provide a deeper understanding of how users engage with LLMs and the role of structured prompting guidance in enhancing AI-assisted communication. By comparing different instructional frameworks, we offer insights into more effective approaches for improving user competency in AI interactions, with implications for AI literacy, chatbot usability, and the design of more responsive AI systems.
SEJun 27, 2025
Autonomic Microservice Management via Agentic AI and MAPE-K IntegrationMatteo Esposito, Alexander Bakhtin, Noman Ahmad et al.
While microservices are revolutionizing cloud computing by offering unparalleled scalability and independent deployment, their decentralized nature poses significant security and management challenges that can threaten system stability. We propose a framework based on MAPE-K, which leverages agentic AI, for autonomous anomaly detection and remediation to address the daunting task of highly distributed system management. Our framework offers practical, industry-ready solutions for maintaining robust and secure microservices. Practitioners and researchers can customize the framework to enhance system stability, reduce downtime, and monitor broader system quality attributes such as system performance level, resilience, security, and anomaly management, among others.
CYMar 4, 2025
Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills DevelopmentDimitri Ognibene, Gregor Donabauer, Emily Theophilou et al.
The use of large language model (LLM)-powered chatbots, such as ChatGPT, has become popular across various domains, supporting a range of tasks and processes. However, due to the intrinsic complexity of LLMs, effective prompting is more challenging than it may seem. This highlights the need for innovative educational and support strategies that are both widely accessible and seamlessly integrated into task workflows. Yet, LLM prompting is highly task- and domain-dependent, limiting the effectiveness of generic approaches. In this study, we explore whether LLM-based methods can facilitate learning assessments by using ad-hoc guidelines and a minimal number of annotated prompt samples. Our framework transforms these guidelines into features that can be identified within learners' prompts. Using these feature descriptions and annotated examples, we create few-shot learning detectors. We then evaluate different configurations of these detectors, testing three state-of-the-art LLMs and ensembles. We run experiments with cross-validation on a sample of original prompts, as well as tests on prompts collected from task-naive learners. Our results show how LLMs perform on feature detection. Notably, GPT- 4 demonstrates strong performance on most features, while closely related models, such as GPT-3 and GPT-3.5 Turbo (Instruct), show inconsistent behaviors in feature classification. These differences highlight the need for further research into how design choices impact feature selection and prompt detection. Our findings contribute to the fields of generative AI literacy and computer-supported learning assessment, offering valuable insights for both researchers and practitioners.
62.6SEMar 31
Making Sense of AI Agents Hype: Adoption, Architectures, and Takeaways from PractitionersRuoyu Su, Matteo Esposito, Roberta Capuano et al.
To support practitioners in understanding how agentic systems are designed in real-world industrial practice, we present a review of practitioner conference talks on AI agents. We analyzed 138 recorded talks to examine how companies adopt agent-based architectures (Objective 1), identify recurring architectural strategies and patterns (Objective 2), and analyze application domains and technologies used to implement and operate LLM-driven agentic systems (Objective 3).
CLJun 11, 2024
Beyond Words: On Large Language Models Actionability in Mission-Critical Risk AnalysisMatteo Esposito, Francesco Palagiano, Valentina Lenarduzzi et al.
Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated 193 unique scenarios leading to 1283 representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other human experts to review the models and the former human experts' analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. Human experts demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs as an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.
SEJun 10, 2024
$Classi|Q\rangle$ Towards a Translation Framework To Bridge The Classical-Quantum Programming GapMatteo Esposito, Maryam Tavassoli Sabzevari, Boshuai Ye et al.
Quantum computing, albeit readily available as hardware or emulated on the cloud, is still far from being available in general regarding complex programming paradigms and learning curves. This vision paper introduces $Classi|Q\rangle$, a translation framework idea to bridge Classical and Quantum Computing by translating high-level programming languages, e.g., Python or C++, into a low-level language, e.g., Quantum Assembly. Our idea paper serves as a blueprint for ongoing efforts in quantum software engineering, offering a roadmap for further $Classi|Q\rangle$ development to meet the diverse needs of researchers and practitioners. $Classi|Q\rangle$ is designed to empower researchers and practitioners with no prior quantum experience to harness the potential of hybrid quantum computation. We also discuss future enhancements to $Classi|Q\rangle$, including support for additional quantum languages, improved optimization strategies, and integration with emerging quantum computing platforms.
SEMay 25, 2023
AI Techniques in the Microservices Life-Cycle: A Systematic Mapping StudySergio Moreschini, Shahrzad Pour, Ivan Lanese et al.
The use of AI in microservices (MSs) is an emerging field as indicated by a substantial number of surveys. However these surveys focus on a specific problem using specific AI techniques, therefore not fully capturing the growth of research and the rise and disappearance of trends. In our systematic mapping study, we take an exhaustive approach to reveal all possible connections between the use of AI techniques for improving any quality attribute (QA) of MSs during the DevOps phases. Our results include 16 research themes that connect to the intersection of particular QAs, AI domains and DevOps phases. Moreover by mapping identified future research challenges and relevant industry domains, we can show that many studies aim to deliver prototypes to be automated at a later stage, aiming at providing exploitable products in a number of key industry domains.
SEOct 7, 2020
Empirical Standards for Software Engineering ResearchPaul Ralph, Nauman bin Ali, Sebastian Baltes et al.
Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.
SEJul 1, 2020
Motivations, Benefits, and Issues for Adopting Micro-Frontends: A Multivocal Literature ReviewSeveri Peltonen, Luca Mezzalira, Davide Taibi
[Context] Micro-Frontends are increasing in popularity, being adopted by several large companies, such as DAZN, Ikea, Starbucks and may others. Micro-Frontends enable splitting of monolithic frontends into independent and smaller micro applications. However, many companies are still hesitant to adopt Micro-Frontends, due to the lack of knowledge concerning their benefits. Additionally, provided online documentation is often times perplexed and contradictory. [Objective] The goal of this work is to map the existing knowledge on Micro-Frontends, by understanding the motivations of companies when adopting such applications as well as possible benefits and issues. [Method] We conducted a Multivocal Literature Review, analyzing 43 sources , and classifying motivations, benefits and issues. [Results] The results show that existing architectural options to build web applications are cumbersome if the application and development team grows, and if multiple teams need to develop the same frontend application. The application of the Micro-Frontend, confirmed the expected benefits, and Micro-Frontends resulted to provide the same benefits as microservices on the back end side, combining the development team into a fully cross-functional development team that can scale processes when needed. However, Micro-Frontends also showed some issues, such as the increased payload size of the application, increased code duplication and coupling between teams, and monitoring complexity. [Conclusions] Micro-Frontends allow companies to scale development according to business needs in the same way microservices do with the back end side. In addition, ...
SESep 19, 2019
From Monolithic Systems to Microservices: An Assessment FrameworkFlorian Auer, Valentina Lenarduzzi, Michael Felderer et al.
Context. Re-architecting monolithic systems with Microservices-based architecture is a common trend. Various companies are migrating to Microservices for different reasons. However, making such an important decision like re-architecting an entire system must be based on real facts and not only on gut feelings. Objective. The goal of this work is to propose an evidence-based decision support framework for companies that need to migrate to Microservices, based on the analysis of a set of characteristics and metrics they should collect before re-architecting their monolithic system. Method. We designed this study with a mixed-methods approach combining a Systematic Mapping Study with a survey done in the form of interviews with professionals to derive the assessment framework based on Grounded Theory. Results. We identified a set consisting of information and metrics that companies can use to decide whether to migrate to Microservices or not. The proposed assessment framework, based on the aforementioned metrics, could be useful for companies if they need to migrate to Microservices and do not want to run the risk of failing to consider some important information.
SEAug 30, 2019
Some SonarQube Issues have a Significant but SmallEffect on Faults and Changes. A large-scale empirical studyValentina Lenarduzzi, Nyyti Saarimäki, Davide Taibi
Context. Companies commonly invest effort to remove technical issues believed to impact software qualities, such as removing anti-patterns or coding styles violations. Objective. Our aim is to analyze the diffuseness of Technical Debt (TD) items in software systems and to assess their impact on code changes and fault-proneness, considering also the type of TD items and their severity. Method. We conducted a case study among 33 Java projects from the Apache Software Foundation (ASF) repository. We analyzed 726 commits containing 27K faults and 12M changes. The projects violated 173 SonarQube rules generating more than 95K TD items in more than 200K classes. Results. Clean classes (classes not affected by TD items) are less change-prone than dirty ones, but the difference between the groups is small. Clean classes are slightly more change-prone than classes affected by TD items of type Code Smell or Security Vulnerability. As for fault-proneness, there is no difference between clean and dirty classes. Moreover, we found a lot of incongruities in the type and severity level assigned by SonarQube. Conclusions. Our result can be useful for practitioners to understand which TD items they should refactor and for researchers to bridge the missing gaps. They can also support companies and tool vendors in identifying TD items as accurately as possible.
SEAug 27, 2019
Continuous Architecting with Microservices and DevOps: A Systematic Mapping StudyDavide Taibi, Valentina Lenarduzzi, Claus Pahl
Context: Several companies are migrating their information systems into the Cloud. Microservices and DevOps are two of the most common adopted technologies. However, there is still a lack of understanding how to adopt a microservice-based architectural style and which tools and technique to use in a continuous architecting pipeline. Objective: We aim at characterizing the different microservice architectural style principles and patterns in order to map existing tools and techniques adopted in the context of DevOps. Methodology: We conducted a Systematic Mapping Study identifying the goal and the research questions, the bibliographic sources, the search strings, and the selection criteria to retrieve the most relevant papers. Results: We identified several agreed microservice architectural principles and patterns widely adopted and reported in 23 case studies, together with a summary of the advantages, disadvantages, and lessons learned for each pattern from the case studies. Finally, we mapped the existing microservices-specific techniques in order to understand how to continuously deliver value in a DevOps pipeline. We depicted the current research, reporting gaps and trends. Conclusion: Different patterns emerge for different migration, orchestration, storage and deployment settings. The results also show the lack of empirical work on microservices-specific techniques, especially for the release phase in DevOps.
SEAug 22, 2019
A Decomposition and Metric-Based Evaluation Framework for MicroservicesDavide Taibi, Kari Systä
Migrating from monolithic systems into microservice is a very complex task. Companies are commonly decomposing the monolithic system manually, analyzing dependencies of the monolith and then assessing different decomposition options. The goal of our work is two-folded: 1) we provide a microservice measurement framework to objectively evaluate and compare the quality of microservices-based systems; 2) we propose a decomposition system based on business process mining. The microservice measurement framework can be applied independently from the decomposition process adopted, but is also useful to continuously evaluate the architectural evolution of a system. Results show that the decomposition framework helps companies to easily identify the different decomposition options. The measurement framework can help to decrease the subjectivity of the decision between different decomposition options and to evaluate architectural erosion in existing systems.
SEAug 12, 2019
Right Scaling for Right Pricing: A Case Study on Total Cost of Ownership Measurement for Cloud MigrationPierangelo Rosati, Frank Fowley, Claus Pahl et al.
Cloud computing promises traditional enterprises and independent software vendors a myriad of advantages over on-premise installations including cost, operational and organizational efficiencies. The decision to migrate software configured for on-premise delivery to the cloud requires careful technical consideration and planning. In this chapter, we discuss the impact of right-scaling on the cost modelling for migration decision making and price setting of software for commercial resale. An integrated process is presented for measuring total cost of ownership, taking in to account IaaS/PaaS resource consumption based on forecast SaaS usage levels. The process is illustrated with a real world case study.
SEAug 12, 2019
Microservices Anti Patterns: A TaxonomyDavide Taibi, Valentina Lenarduzzi, Claus Pahl
Several companies are re-architecting their monolithic information systems with microservices. However, many companies migrated without experience on microservices, mainly learning how to migrate from books or from practitioners' blogs. Because of the novelty of the topic, practitioners and consultancy are learning by doing how to migrate, thus facing several issues but also several benefits. In this chapter, we introduce a catalog and a taxonomy of the most common microservices anti-patterns in order to identify common problems. Our anti-pattern catalogue is based on the experience summarized by different practitioners we interviewed in the last three years. We identified a taxonomy of 20 anti-patterns, including organizational (team oriented and technology/tool oriented) anti-patterns and technical (internal and communication) anti-patterns. The results can be useful to practitioners to avoid experiencing the same difficult situations in the systems they develop. Moreover, researchers can benefit of this catalog and further validate the harmfulness of the anti-patterns identified.
SEAug 5, 2019
An Empirical Study on Technical Debt in a Finnish SMEValentina Lenarduzzi, Teemu Orava, Nyyti Saarimäki et al.
Objective. In this work, we report the experience of a Finnish SME in managing Technical Debt (TD), investigating the most common types of TD they faced in the past, their causes, and their effects. Method. We set up a focus group in the case-company, involving different roles. Results. The results showed that the most significant TD in the company stems from disagreements with the supplier and lack of test automation. Specification and test TD are the most significant types of TD. Budget and time constraints were identified as the most important root causes of TD. Conclusion. TD occurs when time or budget is limited or the amount of work are not understood properly. However, not all postponed activities generated "debt". Sometimes the accumulation of TD helped meet deadlines without a major impact, while in other cases the cost for repaying the TD was much higher than the benefits. From this study, we learned that learning, careful estimations, and continuous improvement could be good strategies to mitigate TD. These strategies include iterative validation with customers, efficient communication with stakeholders, meta-cognition in estimations, and value orientation in budgeting and scheduling.
SEAug 2, 2019
The Technical Debt DatasetValentina Lenarduzzi, Nyyti Saarimäki, Davide Taibi
Technical Debt analysis is increasing in popularity as nowadays researchers and industry are adopting various tools for static code analysis to evaluate the quality of their code. Despite this, empirical studies on software projects are expensive because of the time needed to analyze the projects. In addition, the results are difficult to compare as studies commonly consider different projects. In this work, we propose the Technical Debt Dataset, a curated set of project measurement data from 33 Java projects from the Apache Software Foundation. In the Technical Debt Dataset, we analyzed all commits from separately defined time frames with SonarQube to collect Technical Debt information and with Ptidej to detect code smells. Moreover, we extracted all available commit information from the git logs, the refactoring applied with Refactoring Miner, and fault information reported in the issue trackers (Jira). Using this information, we executed the SZZ algorithm to identify the fault-inducing and -fixing commits. We analyzed 78K commits from the selected 33 projects, detecting 1.8M SonarQube issues, 38K code smells, 28K faults and 57K refactorings. The project analysis took more than 200 days. In this paper, we describe the data retrieval pipeline together with the tools used for the analysis. The dataset is made available through CSV files and an SQLite database to facilitate queries on the data. The Technical Debt Dataset aims to open up diverse opportunities for Technical Debt research, enabling researchers to compare results on common projects.
SEAug 2, 2019
Towards Surgically-Precise Technical Debt Estimation: Early Results and Research RoadmapValentina Lenarduzzi, Antonio Martini, Davide Taibi et al.
The concept of technical debt has been explored from many perspectives but its precise estimation is still under heavy empirical and experimental inquiry. We aim to understand whether, by harnessing approximate, data-driven, machine-learning approaches it is possible to improve the current techniques for technical debt estimation, as represented by a top industry quality analysis tool such as SonarQube. For the sake of simplicity, we focus on relatively simple regression modelling techniques and apply them to modelling the additional project cost connected to the sub-optimal conditions existing in the projects under study. Our results shows that current techniques can be improved towards a more precise estimation of technical debt and the case study shows promising results towards the identification of more accurate estimation of technical debt.
SEApr 29, 2019
Technical Debt Prioritization: State of the Art. A Systematic Literature ReviewValentina Lenarduzzi, Terese Besker, Davide Taibi et al.
Background. Software companies need to manage and refactor Technical Debt issues. Therefore, it is necessary to understand if and when refactoring Technical Debt should be prioritized with respect to developing features or fixing bugs. Objective. The goal of this study is to investigate the existing body of knowledge in software engineering to understand what Technical Debt prioritization approaches have been proposed in research and industry. Method. We conducted a Systematic Literature Review among 384 unique papers published until 2018, following a consolidated methodology applied in Software Engineering. We included 38 primary studies. Results. Different approaches have been proposed for Technical Debt prioritization, all having different goals and optimizing on different criteria. The proposed measures capture only a small part of the plethora of factors used to prioritize Technical Debt qualitatively in practice. We report an impact map of such factors. However, there is a lack of empirical and validated set of tools. Conclusion. We observed that technical Debt prioritization research is preliminary and there is no consensus on what are the important factors and how to measure them. Consequently, we cannot consider current research conclusive and in this paper, we outline different directions for necessary future investigations.
SEApr 26, 2019
Are Architectural Smells Independent from Code Smells? An Empirical StudyFrancesca Arcelli Fontanaa, Valentina Lenarduzzi, Riccardo Roveda et al.
Background. Architectural smells and code smells are symptoms of bad code or design that can cause different quality problems, such as faults, technical debt, or difficulties with maintenance and evolution. Some studies show that code smells and architectural smells often appear together in the same file. The correlation between code smells and architectural smells, however, is not clear yet; some studies on a limited set of projects have claimed that architectural smells can be derived from code smells, while other studies claim the opposite. Objective. The goal of this work is to understand whether architectural smells are independent from code smells or can be derived from a code smell or from one category of them. Method. We conducted a case study analyzing the correlations among 19 code smells, six categories of code smells, and four architectural smells. Results. The results show that architectural smells are correlated with code smells only in a very low number of occurrences and therefore cannot be derived from code smells. Conclusion. Architectural smells are independent from code smells, and therefore deserve special attention by researchers, who should investigate their actual harmfulness, and practitioners, who should consider whether and when to remove them.
SEFeb 17, 2019
Does Migrate a Monolithic System to Microservices Decrease the Technical Debt?Valentina Lenarduzzi, Francesco Lomio, Nyyti Saarimäki et al.
Background. The migration from monolithic systems to microservices involves deep refactoring of the systems. Therefore, the migration usually has a big economic impact and companies tend to postpone several activities during this process, mainly to speed-up the migration itself, but also because of the need to release new features. Objective. We monitored the Technical Debt of a small and medium enterprise while migrating a legacy monolithic system to an ecosystem of microservices to analyze changes in the code technical debt before and after the migration to microservices. Method. We conducted a case study analyzing more than four years of the history of a big project (280K Lines of Code) where two teams extracted five business processes from the monolithic system as microservices, by first analyzing the Technical Debt with SonarQube and then performing a qualitative study with the developers to understand the perceived quality of the system and the motivation for eventually postponed activities. Result. The development of microservices helps to reduce the Technical Debt in the long run. Despite an initial spike in the Technical Debt, due to the development of the new microservice, after a relatively short period, the Technical Debt tends to grow slower than in the monolithic system.
SEOct 25, 2018
Microservices, Continuous Architecture, and Technical Debt Interest: An Empirical StudyValentina Lenarduzzi, Davide Taibi
Continuous Architecture (CA) is an approach that supports companies in decreasing the time between deliveries. Migration to microservices is one of the most common situations when companies adopt continuous architecting processes [4]. Companies commonly adopt an initial migration strategy to extract some components from the monolithic system as microservices, making use of simplified microservices patterns [5][4]. As an example, companies commonly directly connect the microservices to the legacy monolithic system and do not adopt any message bus at the beginning. When the system starts to grow in complexity, they usually start re-architecting their systems, considering different architectural patterns. Some companies introduce API gateway patterns to simplify the management and load balancing of the different services, while others also consider a lightweight message bus [4][5][6]. All these architectural changes require deep refactoring of the system, thereby increasing the risk of new faults being introduced. In this paper, we report the preliminary results of work in progress, where we monitored the TD of an SME (SMEs = small and medium enterprises) that adopted a CA approach while migrating a legacy monolithic system to an ecosystem of microservices. To the best of our knowledge, no studies exist on the impact of postponed activities on the TD, especially in the context of CA and microservices. This work will help companies understand how TD grows and changes over time while at the same time opening up new avenues for future research on the analysis of TD interest in continuous architecting processes.