Justus Bogner

SE
h-index33
25papers
888citations
Novelty21%
AI Score36

25 Papers

73.7SEJun 3
The State of Peer Review in Empirical Software Engineering: A Community Survey on Review Load, Quality, and GenAI Use

Justus Bogner, Roberto Verdecchia

The scientific peer review system has been slowly deteriorating over the last years, and not just within empirical software engineering (ESE) research. Increased submission numbers, high workload, and the rise of generative AI use with all its associated issues have made many cracks in the system more visible. To get a better understanding of the current state of peer review in the ESE community, we conducted a questionnaire survey, which accumulated 120 responses. We report on (i) the perceived review load of community members, (ii) review quality perception as well as frequent challenges for and issues with reviews, (iii) the use of LLM-based tools in the reviewing process, and (iv) the community's suggestions for improving the peer review system. We hope that these community opinions can facilitate more evidence-based discussions about how people want to see the review system change for the better.

SEJun 3, 2022
Can Requirements Engineering Support Explainable Artificial Intelligence? Towards a User-Centric Approach for Explainability Requirements

Umm-e-Habiba, Justus Bogner, Stefan Wagner

With the recent proliferation of artificial intelligence systems, there has been a surge in the demand for explainability of these systems. Explanations help to reduce system opacity, support transparency, and increase stakeholder trust. In this position paper, we discuss synergies between requirements engineering (RE) and Explainable AI (XAI). We highlight challenges in the field of XAI, and propose a framework and research directions on how RE practices can help to mitigate these challenges.

SESep 11, 2024
How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions

Umm-e- Habiba, Markus Haug, Justus Bogner et al.

Artificial intelligence (AI) permeates all fields of life, which resulted in new challenges in requirements engineering for artificial intelligence (RE4AI), e.g., the difficulty in specifying and validating requirements for AI or considering new quality requirements due to emerging ethical implications. It is currently unclear if existing RE methods are sufficient or if new ones are needed to address these challenges. Therefore, our goal is to provide a comprehensive overview of RE4AI to researchers and practitioners. What has been achieved so far, i.e., what practices are available, and what research gaps and challenges still need to be addressed? To achieve this, we conducted a systematic mapping study combining query string search and extensive snowballing. The extracted data was aggregated, and results were synthesized using thematic analysis. Our selection process led to the inclusion of 126 primary studies. Existing RE4AI research focuses mainly on requirements analysis and elicitation, with most practices applied in these areas. Furthermore, we identified requirements specification, explainability, and the gap between machine learning engineers and end-users as the most prevalent challenges, along with a few others. Additionally, we proposed seven potential research directions to address these challenges. Practitioners can use our results to identify and select suitable RE methods for working on their AI-based systems, while researchers can build on the identified gaps and research directions to push the field forward.

SEMar 23, 2023
Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository

Lukas Heiland, Marius Hauser, Justus Bogner

Systems with artificial intelligence components, so-called AI-based systems, have gained considerable attention recently. However, many organizations have issues with achieving production readiness with such systems. As a means to improve certain software quality attributes and to address frequently occurring problems, design patterns represent proven solution blueprints. While new patterns for AI-based systems are emerging, existing patterns have also been adapted to this new context. The goal of this study is to provide an overview of design patterns for AI-based systems, both new and adapted ones. We want to collect and categorize patterns, and make them accessible for researchers and practitioners. To this end, we first performed a multivocal literature review (MLR) to collect design patterns used with AI-based systems. We then integrated the created pattern collection into a web-based pattern repository to make the patterns browsable and easy to find. As a result, we selected 51 resources (35 white and 16 gray ones), from which we extracted 70 unique patterns used for AI-based systems. Among these are 34 new patterns and 36 traditional ones that have been adapted to this context. Popular pattern categories include "architecture" (25 patterns), "deployment" (16), "implementation" (9), or "security & safety" (9). While some patterns with four or more mentions already seem established, the majority of patterns have only been mentioned once or twice (51 patterns). Our results in this emerging field can be used by researchers as a foundation for follow-up studies and by practitioners to discover relevant patterns for informing the design of AI-based systems.

LGJul 3, 2024
The More the Merrier? Navigating Accuracy vs. Energy Efficiency Design Trade-Offs in Ensemble Learning Systems

Rafiullah Omar, Justus Bogner, Henry Muccini et al.

Background: Machine learning (ML) model composition is a popular technique to mitigate shortcomings of a single ML model and to design more effective ML-enabled systems. While ensemble learning, i.e., forwarding the same request to several models and fusing their predictions, has been studied extensively for accuracy, we have insufficient knowledge about how to design energy-efficient ensembles. Objective: We therefore analyzed three types of design decisions for ensemble learning regarding a potential trade-off between accuracy and energy consumption: a) ensemble size, i.e., the number of models in the ensemble, b) fusion methods (majority voting vs. a meta-model), and c) partitioning methods (whole-dataset vs. subset-based training). Methods: By combining four popular ML algorithms for classification in different ensembles, we conducted a full factorial experiment with 11 ensembles x 4 datasets x 2 fusion methods x 2 partitioning methods (176 combinations). For each combination, we measured accuracy (F1-score) and energy consumption in J (for both training and inference). Results: While a larger ensemble size significantly increased energy consumption (size 2 ensembles consumed 37.49% less energy than size 3 ensembles, which in turn consumed 26.96% less energy than the size 4 ensembles), it did not significantly increase accuracy. Furthermore, majority voting outperformed meta-model fusion both in terms of accuracy (Cohen's d of 0.38) and energy consumption (Cohen's d of 0.92). Lastly, subset-based training led to significantly lower energy consumption (Cohen's d of 0.91), while training on the whole dataset did not increase accuracy significantly. Conclusions: From a Green AI perspective, we recommend designing ensembles of small size (2 or maximum 3 models), using subset-based training, majority voting, and energy-efficient ML algorithms like decision trees, Naive Bayes, or KNN.

SEMar 23, 2023
A Case Study on AI Engineering Practices: Developing an Autonomous Stock Trading System

Marcel Grote, Justus Bogner

Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare. In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature. Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.

SENov 22, 2023
Analyzing the Evolution and Maintenance of ML Models on Hugging Face

Joel Castaño, Silverio Martínez-Fernández, Xavier Franch et al.

Hugging Face (HF) has established itself as a crucial platform for the development and sharing of machine learning (ML) models. This repository mining study, which delves into more than 380,000 models using data gathered via the HF Hub API, aims to explore the community engagement, evolution, and maintenance around models hosted on HF, aspects that have yet to be comprehensively explored in the literature. We first examine the overall growth and popularity of HF, uncovering trends in ML domains, framework usage, authors grouping and the evolution of tags and datasets used. Through text analysis of model card descriptions, we also seek to identify prevalent themes and insights within the developer community. Our investigation further extends to the maintenance aspects of models, where we evaluate the maintenance status of ML models, classify commit messages into various categories (corrective, perfective, and adaptive), analyze the evolution across development stages of commits metrics and introduce a new classification system that estimates the maintenance status of models based on multiple attributes. This study aims to provide valuable insights about ML model maintenance and evolution that could inform future model development strategies on platforms like HF.

SEDec 15, 2023
A Synthesis of Green Architectural Tactics for ML-Enabled Systems

Heli Järvenpää, Patricia Lago, Justus Bogner et al.

The rapid adoption of artificial intelligence (AI) and machine learning (ML) has generated growing interest in understanding their environmental impact and the challenges associated with designing environmentally friendly ML-enabled systems. While Green AI research, i.e., research that tries to minimize the energy footprint of AI, is receiving increasing attention, very few concrete guidelines are available on how ML-enabled systems can be designed to be more environmentally sustainable. In this paper, we provide a catalog of 30 green architectural tactics for ML-enabled systems to fill this gap. An architectural tactic is a high-level design technique to improve software quality, in our case environmental sustainability. We derived the tactics from the analysis of 51 peer-reviewed publications that primarily explore Green AI, and validated them using a focus group approach with three experts. The 30 tactics we identified are aimed to serve as an initial reference guide for further exploration into Green AI from a software engineering perspective, and assist in designing sustainable ML-enabled systems. To enhance transparency and facilitate their widespread use and extension, we make the tactics available online in easily consumable formats. Wide-spread adoption of these tactics has the potential to substantially reduce the societal impact of ML-enabled systems regarding their energy and carbon footprint.

SEJun 2, 2025
Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices

Luís Cruz, João Paulo Fernandes, Maja H. Kirkeby et al.

The environmental impact of Artificial Intelligence (AI)-enabled systems is increasing rapidly, and software engineering plays a critical role in developing sustainable solutions. The "Greening AI with Software Engineering" CECAM-Lorentz workshop (no. 1358, 2025) funded by the Centre Européen de Calcul Atomique et Moléculaire and the Lorentz Center, provided an interdisciplinary forum for 29 participants, from practitioners to academics, to share knowledge, ideas, practices, and current results dedicated to advancing green software and AI research. The workshop was held February 3-7, 2025, in Lausanne, Switzerland. Through keynotes, flash talks, and collaborative discussions, participants identified and prioritized key challenges for the field. These included energy assessment and standardization, benchmarking practices, sustainability-aware architectures, runtime adaptation, empirical methodologies, and education. This report presents a research agenda emerging from the workshop, outlining open research directions and practical recommendations to guide the development of environmentally sustainable AI-enabled systems rooted in software engineering principles.

LGApr 30, 2024
How to Sustainably Monitor ML-Enabled Systems? Accuracy and Energy Efficiency Tradeoffs in Concept Drift Detection

Rafiullah Omar, Justus Bogner, Joran Leest et al.

ML-enabled systems that are deployed in a production environment typically suffer from decaying model prediction quality through concept drift, i.e., a gradual change in the statistical characteristics of a certain real-world domain. To combat this, a simple solution is to periodically retrain ML models, which unfortunately can consume a lot of energy. One recommended tactic to improve energy efficiency is therefore to systematically monitor the level of concept drift and only retrain when it becomes unavoidable. Different methods are available to do this, but we know very little about their concrete impact on the tradeoff between accuracy and energy efficiency, as these methods also consume energy themselves. To address this, we therefore conducted a controlled experiment to study the accuracy vs. energy efficiency tradeoff of seven common methods for concept drift detection. We used five synthetic datasets, each in a version with abrupt and one with gradual drift, and trained six different ML models as base classifiers. Based on a full factorial design, we tested 420 combinations (7 drift detectors * 5 datasets * 2 types of drift * 6 base classifiers) and compared energy consumption and drift detection accuracy. Our results indicate that there are three types of detectors: a) detectors that sacrifice energy efficiency for detection accuracy (KSWIN), b) balanced detectors that consume low to medium energy with good accuracy (HDDM_W, ADWIN), and c) detectors that consume very little energy but are unusable in practice due to very poor accuracy (HDDM_A, PageHinkley, DDM, EDDM). By providing rich evidence for this energy efficiency tactic, our findings support ML practitioners in choosing the best suited method of concept drift detection for their ML-enabled systems.

SEApr 5, 2024
Balancing Progress and Responsibility: A Synthesis of Sustainability Trade-Offs of AI-Based Systems

Apoorva Nalini Pradeep Kumar, Justus Bogner, Markus Funke et al.

Recent advances in artificial intelligence (AI) capabilities have increased the eagerness of companies to integrate AI into software systems. While AI can be used to have a positive impact on several dimensions of sustainability, this is often overshadowed by its potential negative influence. While many studies have explored sustainability factors in isolation, there is insufficient holistic coverage of potential sustainability benefits or costs that practitioners need to consider during decision-making for AI adoption. We therefore aim to synthesize trade-offs related to sustainability in the context of integrating AI into software systems. We want to make the sustainability benefits and costs of integrating AI more transparent and accessible for practitioners. The study was conducted in collaboration with a Dutch financial organization. We first performed a rapid review that led to the inclusion of 151 research papers. Afterward, we conducted six semi-structured interviews to enrich the data with industry perspectives. The combined results showcase the potential sustainability benefits and costs of integrating AI. The labels synthesized from the review regarding potential sustainability benefits were clustered into 16 themes, with "energy management" being the most frequently mentioned one. 11 themes were identified in the interviews, with the top mentioned theme being "employee wellbeing". Regarding sustainability costs, the review discovered seven themes, with "deployment issues" being the most popular one, followed by "ethics & society". "Environmental issues" was the top theme from the interviews. Our results provide valuable insights to organizations and practitioners for understanding the potential sustainability implications of adopting AI.

SEFeb 1, 2025
How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model Integration

Shreyas Kumar Parida, Ilias Gerostathopoulos, Justus Bogner

Machine learning (ML) models are often integrated into ML-enabled systems to provide software functionality that would otherwise be impossible. This integration requires the selection of an appropriate ML model export format, for which many options are available. These formats are crucial for ensuring a seamless integration, and choosing a suboptimal one can negatively impact system development. However, little evidence is available to guide practitioners during the export format selection. We therefore evaluated various model export formats regarding their impact on the development of ML-enabled systems from an integration perspective. Based on the results of a preliminary questionnaire survey (n=17), we designed an extensive embedded case study with two ML-enabled systems in three versions with different technologies. We then analyzed the effect of five popular export formats, namely ONNX, Pickle, TensorFlow's SavedModel, PyTorch's TorchScript, and Joblib. In total, we studied 30 units of analysis (2 systems x 3 tech stacks x 5 formats) and collected data via structured field notes. The holistic qualitative analysis of the results indicated that ONNX offered the most efficient integration and portability across most cases. SavedModel and TorchScript were very convenient to use in Python-based systems, but otherwise required workarounds (TorchScript more than SavedModel). SavedModel also allowed the easy incorporation of preprocessing logic into a single file, which made it scalable for complex deep learning use cases. Pickle and Joblib were the most challenging to integrate, even in Python-based systems. Regarding technical support, all model export formats had strong technical documentation and strong community support across platforms such as Stack Overflow and Reddit. Practitioners can use our findings to inform the selection of ML export formats suited to their context.

CYMay 12, 2025
How Do Companies Manage the Environmental Sustainability of AI? An Interview Study About Green AI Efforts and Regulations

Ashmita Sampatsing, Sophie Vos, Emma Beauxis-Aussalet et al.

With the ever-growing adoption of artificial intelligence (AI), AI-based software and its negative impact on the environment are no longer negligible, and studying and mitigating this impact has become a critical area of research. However, it is currently unclear which role environmental sustainability plays during AI adoption in industry and how AI regulations influence Green AI practices and decision-making in industry. We therefore aim to investigate the Green AI perception and management of industry practitioners. To this end, we conducted a total of 11 interviews with participants from 10 different organizations that adopted AI-based software. The interviews explored three main themes: AI adoption, current efforts in mitigating the negative environmental impact of AI, and the influence of the EU AI Act and the Corporate Sustainability Reporting Directive (CSRD). Our findings indicate that 9 of 11 participants prioritized business efficiency during AI adoption, with minimal consideration of environmental sustainability. Monitoring and mitigation of AI's environmental impact were very limited. Only one participant monitored negative environmental effects. Regarding applied mitigation practices, six participants reported no actions, with the others sporadically mentioning techniques like prompt engineering, relying on smaller models, or not overusing AI. Awareness and compliance with the EU AI Act are low, with only one participant reporting on its influence, while the CSRD drove sustainability reporting efforts primarily in larger companies. All in all, our findings reflect a lack of urgency and priority for sustainable AI among these companies. We suggest that current regulations are not very effective, which has implications for policymakers. Additionally, there is a need to raise industry awareness, but also to provide user-friendly techniques and tools for Green AI practices.

SEMay 25, 2023
AI Techniques in the Microservices Life-Cycle: A Systematic Mapping Study

Sergio Moreschini, Shahrzad Pour, Ivan Lanese et al.

The use of AI in microservices (MSs) is an emerging field as indicated by a substantial number of surveys. However these surveys focus on a specific problem using specific AI techniques, therefore not fully capturing the growth of research and the rise and disappearance of trends. In our systematic mapping study, we take an exhaustive approach to reveal all possible connections between the use of AI techniques for improving any quality attribute (QA) of MSs during the DevOps phases. Our results include 16 research themes that connect to the intersection of particular QAs, AI domains and DevOps phases. Moreover by mapping identified future research challenges and relevant industry domains, we can show that many studies aim to deliver prototypes to be automated at a later stage, aiming at providing exploitable products in a number of key industry domains.

LGMay 18, 2023
Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study

Joel Castaño, Silverio Martínez-Fernández, Xavier Franch et al.

The rise of machine learning (ML) systems has exacerbated their carbon footprint due to increased capabilities and model sizes. However, there is scarce knowledge on how the carbon footprint of ML models is actually measured, reported, and evaluated. In light of this, the paper aims to analyze the measurement of the carbon footprint of 1,417 ML models and associated datasets on Hugging Face, which is the most popular repository for pretrained ML models. The goal is to provide insights and recommendations on how to report and optimize the carbon efficiency of ML models. The study includes the first repository mining study on the Hugging Face Hub API on carbon emissions. This study seeks to answer two research questions: (1) how do ML model creators measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects impact the carbon emissions of training ML models? The study yielded several key findings. These include a stalled proportion of carbon emissions-reporting models, a slight decrease in reported carbon footprint on Hugging Face over the past 2 years, and a continued dominance of NLP as the main application domain. Furthermore, the study uncovers correlations between carbon emissions and various attributes such as model size, dataset size, and ML application domains. These results highlight the need for software measurements to improve energy reporting practices and promote carbon-efficient model development within the Hugging Face community. In response to this issue, two classifications are proposed: one for categorizing models based on their carbon emission reporting practices and another for their carbon efficiency. The aim of these classification proposals is to foster transparency and sustainable model development within the ML community.

SEJan 10, 2022
Designing Microservice Systems Using Patterns: An Empirical Study on Quality Trade-Offs

Guilherme Vale, Filipe Figueiredo Correia, Eduardo Martins Guerra et al.

The promise of increased agility, autonomy, scalability, and reusability has made the microservices architecture a \textit{de facto} standard for the development of large-scale and cloud-native commercial applications. Software patterns are an important design tool, and often they are selected and combined with the goal of obtaining a set of desired quality attributes. However, from a research standpoint, many patterns have not been widely validated against industry practice, making them not much more than interesting theories. To address this, we investigated how practitioners perceive the impact of 14 patterns on 7 quality attributes. Hence, we conducted 9 semi-structured interviews to collect industry expertise regarding (1) knowledge and adoption of software patterns, (2) the perceived architectural trade-offs of patterns, and (3) metrics professionals use to measure quality attributes. We found that many of the trade-offs reported in our study matched the documentation of each respective pattern, and identified several gains and pains which have not yet been reported, leading to novel insight about microservice patterns.

SEJul 30, 2021
Which RESTful API Design Rules Are Important and How Do They Improve Software Quality? A Delphi Study with Industry Experts

Sebastian Kotstein, Justus Bogner

Several studies analyzed existing Web APIs against the constraints of REST to estimate the degree of REST compliance among state-of-the-art APIs. These studies revealed that only a small number of Web APIs are truly RESTful. Moreover, identified mismatches between theoretical REST concepts and practical implementations lead us to believe that practitioners perceive many rules and best practices aligned with these REST concepts differently in terms of their importance and impact on software quality. We therefore conducted a Delphi study in which we confronted eight Web API experts from industry with a catalog of 82 REST API design rules. For each rule, we let them rate its importance and software quality impact. As consensus, our experts rated 28 rules with high, 17 with medium, and 37 with low importance. Moreover, they perceived usability, maintainability, and compatibility as the most impacted quality attributes. The detailed analysis revealed that the experts saw rules for reaching Richardson maturity level 2 as critical, while reaching level 3 was less important. As the acquired consensus data may serve as valuable input for designing a tool-supported approach for the automatic quality evaluation of RESTful APIs, we briefly discuss requirements for such an approach and comment on the applicability of the most important rules.

SEMay 5, 2021
Software Engineering for AI-Based Systems: A Survey

Silverio Martínez-Fernández, Justus Bogner, Xavier Franch et al.

AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image- and speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state of the art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.

SEMar 17, 2021
Characterizing Technical Debt and Antipatterns in AI-Based Systems: A Systematic Mapping Study

Justus Bogner, Roberto Verdecchia, Ilias Gerostathopoulos

Background: With the rising popularity of Artificial Intelligence (AI), there is a growing need to build large and complex AI-based systems in a cost-effective and manageable way. Like with traditional software, Technical Debt (TD) will emerge naturally over time in these systems, therefore leading to challenges and risks if not managed appropriately. The influence of data science and the stochastic nature of AI-based systems may also lead to new types of TD or antipatterns, which are not yet fully understood by researchers and practitioners. Objective: The goal of our study is to provide a clear overview and characterization of the types of TD (both established and new ones) that appear in AI-based systems, as well as the antipatterns and related solutions that have been proposed. Method: Following the process of a systematic mapping study, 21 primary studies are identified and analyzed. Results: Our results show that (i) established TD types, variations of them, and four new TD types (data, model, configuration, and ethics debt) are present in AI-based systems, (ii) 72 antipatterns are discussed in the literature, the majority related to data and model deficiencies, and (iii) 46 solutions have been proposed, either to address specific TD types, antipatterns, or TD in general. Conclusions: Our results can support AI professionals with reasoning about and communicating aspects of TD present in their systems. Additionally, they can serve as a foundation for future research to further our understanding of TD in AI-based systems.

SEJan 29, 2021
Résumé-Driven Development: A Definition and Empirical Characterization

Jonas Fritzsch, Marvin Wyrich, Justus Bogner et al.

Technologies play an important role in the hiring process for software professionals. Within this process, several studies revealed misconceptions and bad practices which lead to suboptimal recruitment experiences. In the same context, grey literature anecdotally coined the term Résumé-Driven Development (RDD), a phenomenon describing the overemphasis of trending technologies in both job offerings and resumes as an interaction between employers and applicants. While RDD has been sporadically mentioned in books and online discussions, there are so far no scientific studies on the topic, despite its potential negative consequences. We therefore empirically investigated this phenomenon by surveying 591 software professionals in both hiring (130) and technical (558) roles and identified RDD facets in substantial parts of our sample: 60% of our hiring professionals agreed that trends influence their job offerings, while 82% of our software professionals believed that using trending technologies in their daily work makes them more attractive for prospective employers. Grounded in the survey results, we conceptualize a theory to frame and explain Résumé-Driven Development. Finally, we discuss influencing factors and consequences and propose a definition of the term. Our contribution provides a foundation for future research and raises awareness for a potentially systemic trend that may broadly affect the software industry.

SEJul 20, 2020
Collecting Service-Based Maintainability Metrics from RESTful API Descriptions: Static Analysis and Threshold Derivation

Justus Bogner, Stefan Wagner, Alfred Zimmermann

While many maintainability metrics have been explicitly designed for service-based systems, tool-supported approaches to automatically collect these metrics are lacking. Especially in the context of microservices, decentralization and technological heterogeneity may pose challenges for static analysis. We therefore propose the modular and extensible RAMA approach (RESTful API Metric Analyzer) to calculate such metrics from machine-readable interface descriptions of RESTful services. We also provide prototypical tool support, the RAMA CLI, which currently parses the formats OpenAPI, RAML, and WADL and calculates 10 structural service-based metrics proposed in scientific literature. To make RAMA measurement results more actionable, we additionally designed a repeatable benchmark for quartile-based threshold ranges (green, yellow, orange, red). In an exemplary run, we derived thresholds for all RAMA CLI metrics from the interface descriptions of 1,737 publicly available RESTful APIs. Researchers and practitioners can use RAMA to evaluate the maintainability of RESTful services or to support the empirical evaluation of new service interface metrics.

SEJul 12, 2020
Determining Microservice Boundaries: A Case Study Using Static and Dynamic Software Analysis

Tiago Matias, Filipe F. Correia, Jonas Fritzsch et al.

A number of approaches have been proposed to identify service boundaries when decomposing a monolith to microservices. However, only a few use systematic methods and have been demonstrated with replicable empirical studies. We describe a systematic approach for refactoring systems to microservice architectures that uses static analysis to determine the system's structure and dynamic analysis to understand its actual behavior. A prototype of a tool was built using this approach (MonoBreaker) and was used to conduct a case study on a real-world software project. The goal was to assess the feasibility and benefits of a systematic approach to decomposition that combines static and dynamic analysis. The three study participants regarded as positive the decomposition proposed by our tool, and considered that it showed improvements over approaches that rely only on static analysis.

SEJun 12, 2019
Assuring the Evolvability of Microservices: Insights into Industry Practices and Challenges

Justus Bogner, Jonas Fritzsch, Stefan Wagner et al.

While Microservices promise several beneficial characteristics for sustainable long-term software evolution, little empirical research covers what concrete activities industry applies for the evolvability assurance of Microservices and how technical debt is handled in such systems. Since insights into the current state of practice are very important for researchers, we performed a qualitative interview study to explore applied evolvability assurance processes, the usage of tools, metrics, and patterns, as well as participants' reflections on the topic. In 17 semi-structured interviews, we discussed 14 different Microservice-based systems with software professionals from 10 companies and how the sustainable evolution of these systems was ensured. Interview transcripts were analyzed with a detailed coding system and the constant comparison method. We found that especially systems for external customers relied on central governance for the assurance. Participants saw guidelines like architectural principles as important to ensure a base consistency for evolvability. Interviewees also valued manual activities like code review or boy scouting, even though automation and tool support was described as very important. Source code quality was the primary target for the usage of tools and metrics. Despite most reported issues being related to Architectural Technical Debt (ATD), our participants did not apply any architectural or service-oriented tools and metrics. While participants generally saw their Microservices as evolvable, service cutting and finding an appropriate service granularity with low coupling and high cohesion were reported as challenging. Future Microservices research in the areas of evolution and technical debt should take these findings and industry sentiments into account.

SEJun 11, 2019
Microservices Migration in Industry: Intentions, Strategies, and Challenges

Jonas Fritzsch, Justus Bogner, Stefan Wagner et al.

To remain competitive in a fast changing environment, many companies started to migrate their legacy applications towards a Microservices architecture. Such extensive migration processes require careful planning and consideration of implications and challenges likewise. In this regard, hands-on experiences from industry practice are still rare. To fill this gap in scientific literature, we contribute a qualitative study on intentions, strategies, and challenges in the context of migrations to Microservices. We investigated the migration process of 14 systems across different domains and sizes by conducting 16 in-depth interviews with software professionals from 10 companies. We present a separate description of each case and summarize the most important findings. As primary migration drivers, maintainability and scalability were identified. Due to the high complexity of their legacy systems, most companies preferred a rewrite using current technologies over splitting up existing code bases. This was often caused by the absence of a suitable decomposition approach. As such, finding the right service cut was a major technical challenge, next to building the necessary expertise with new technologies. Organizational challenges were especially related to large, traditional companies that simultaneously established agile processes. Initiating a mindset change and ensuring smooth collaboration between teams were crucial for them. Future research on the evolution of software systems will in particular profit from the individual cases presented.

SEJul 26, 2018
From Monolith to Microservices: A Classification of Refactoring Approaches

Jonas Fritzsch, Justus Bogner, Alfred Zimmermann et al.

While the recently emerged Microservices architectural style is widely discussed in literature, it is difficult to find clear guidance on the process of refactoring legacy applications. The importance of the topic is underpinned by high costs and effort of a refactoring process which has several other implications, e.g. overall processes (DevOps) and team structure. Software architects facing this challenge are in need of selecting an appropriate strategy and refactoring technique. One of the most discussed aspects in this context is finding the right service granularity to fully leverage the advantages of a Microservices architecture. This study first discusses the notion of architectural refactoring and subsequently compares 10 existing refactoring approaches recently proposed in academic literature. The approaches are classified by the underlying decomposition technique and visually presented in the form of a decision guide for quick reference. The review yielded a variety of strategies to break down a monolithic application into independent services. With one exception, most approaches are only applicable under certain conditions. Further concerns are the significant amount of input data some approaches require as well as limited or prototypical tool support.