Michael Felderer

SE
h-index47
51papers
2,105citations
Novelty18%
AI Score36

51 Papers

SEJan 21, 2023
The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice

Monika Steidl, Michael Felderer, Rudolf Ramler

Companies struggle to continuously develop and deploy AI models to complex production systems due to AI characteristics while assuring quality. To ease the development process, continuous pipelines for AI have become an active research area where consolidated and in-depth analysis regarding the terminology, triggers, tasks, and challenges is required. This paper includes a Multivocal Literature Review where we consolidated 151 relevant formal and informal sources. In addition, nine-semi structured interviews with participants from academia and industry verified and extended the obtained information. Based on these sources, this paper provides and compares terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle management, and CD4ML. Furthermore, the paper provides an aggregated list of potential triggers for reiterating the pipeline, such as alert systems or schedules. In addition, this work uses a taxonomy creation strategy to present a consolidated pipeline comprising tasks regarding the continuous development of AI. This pipeline consists of four stages: Data Handling, Model Learning, Software Development and System Operations. Moreover, we map challenges regarding pipeline implementation, adaption, and usage for the continuous development of AI to these four stages.

SEMar 19, 2022
Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based Systems

Harald Foidl, Michael Felderer, Rudolf Ramler

High data quality is fundamental for today's AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-based systems (e.g., technical debt, data-induced faults). As a counterpart to code smells in software engineering, we refer to such issues as Data Smells. This article conceptualizes data smells and elaborates on their causes, consequences, detection, and use in the context of AI-based systems. In addition, a catalogue of 36 data smells divided into three categories (i.e., Believability Smells, Understandability Smells, Consistency Smells) is presented. Moreover, the article outlines tool support for detecting data smells and presents the result of an initial smell detection on more than 240 real-world datasets.

SEMar 23, 2022
What is Software Quality for AI Engineers? Towards a Thinning of the Fog

Valentina Golendukhina, Valentina Lenarduzzi, Michael Felderer

It is often overseen that AI-enabled systems are also software systems and therefore rely on software quality assurance (SQA). Thus, the goal of this study is to investigate the software quality assurance strategies adopted during the development, integration, and maintenance of AI/ML components and code. We conducted semi-structured interviews with representatives of ten Austrian SMEs that develop AI-enabled systems. A qualitative analysis of the interview data identified 12 issues in the development of AI/ML components. Furthermore, we identified when quality issues arise in AI/ML components and how they are detected. The results of this study should guide future work on software quality assurance processes and techniques for AI/ML components.

CRSep 18, 2023
VULNERLIZER: Cross-analysis Between Vulnerabilities and Software Libraries

Irdin Pekaric, Michael Felderer, Philipp Steinmüller

The identification of vulnerabilities is a continuous challenge in software projects. This is due to the evolution of methods that attackers employ as well as the constant updates to the software, which reveal additional issues. As a result, new and innovative approaches for the identification of vulnerable software are needed. In this paper, we present VULNERLIZER, which is a novel framework for cross-analysis between vulnerabilities and software libraries. It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries. In addition, the training of the model is conducted in order to reevaluate the generated associations. This is achieved by updating the assigned weights. Finally, the approach is then evaluated by making the predictions using the CVE data from the test set. The results show that the VULNERLIZER has a great potential in being able to predict future vulnerable libraries based on an initial input CVE entry or a software library. The trained model reaches a prediction accuracy of 75% or higher.

SEMay 16, 2022
Automatic Error Classification and Root Cause Determination while Replaying Recorded Workload Data at SAP HANA

Neetha Jambigi, Thomas Bach, Felix Schabernack et al.

Capturing customer workloads of database systems to replay these workloads during internal testing can be beneficial for software quality assurance. However, we experienced that such replays can produce a large amount of false positive alerts that make the results unreliable or time consuming to analyze. Therefore, we design a machine learning based approach that attributes root causes to the alerts. This provides several benefits for quality assurance and allows for example to classify whether an alert is true positive or false positive. Our approach considerably reduces manual effort and improves the overall quality assurance for the database system SAP HANA. We discuss the problem, the design and result of our approach, and we present practical limitations that may require further research.

SEJul 8, 2022
Guiding the retraining of convolutional neural networks against adversarial inputs

Francisco Durán López, Silverio Martínez-Fernández, Michael Felderer et al.

Background: When using deep learning models, there are many possible vulnerabilities and some of the most worrying are the adversarial inputs, which can cause wrong decisions with minor perturbations. Therefore, it becomes necessary to retrain these models against adversarial inputs, as part of the software testing process addressing the vulnerability to these inputs. Furthermore, for an energy efficient testing and retraining, data scientists need support on which are the best guidance metrics and optimal dataset configurations. Aims: We examined four guidance metrics for retraining convolutional neural networks and three retraining configurations. Our goal is to improve the models against adversarial inputs regarding accuracy, resource utilization and time from the point of view of a data scientist in the context of image classification. Method: We conducted an empirical study in two datasets for image classification. We explore: (a) the accuracy, resource utilization and time of retraining convolutional neural networks by ordering new training set by four different guidance metrics (neuron coverage, likelihood-based surprise adequacy, distance-based surprise adequacy and random), (b) the accuracy and resource utilization of retraining convolutional neural networks with three different configurations (from scratch and augmented dataset, using weights and augmented dataset, and using weights and only adversarial inputs). Results: We reveal that retraining with adversarial inputs from original weights and by ordering with surprise adequacy metrics gives the best model w.r.t. the used metrics. Conclusions: Although more studies are necessary, we recommend data scientists to use the above configuration and metrics to deal with the vulnerability to adversarial inputs of deep learning models, as they can improve their models against adversarial inputs without using many inputs.

29.9CRApr 6Code
Bridging Safety and Security in Complex Systems: A Model-Based Approach with SAFT-GT Toolchain

Irdin Pekaric, Raffaela Groner, Alexander Raschke et al.

In the rapidly evolving landscape of software engineering, the demand for robust and secure systems has become increasingly critical. This is especially true for self-adaptive systems due to their complexity and the dynamic environments in which they operate. To address this issue, we designed and developed the SAFT-GT toolchain that tackles the multifaceted challenges associated with ensuring both safety and security. This paper provides a comprehensive description of the toolchain's architecture and functionalities, including the Attack-Fault Trees generation and model combination approaches. We emphasize the toolchain's ability to integrate seamlessly with existing systems, allowing for enhanced safety and security analyses without requiring extensive modifications and domain knowledge. Our proposed approach can address evolving security threats, including both known vulnerabilities and emerging attack vectors that could compromise the system. As a use case for the toolchain, we integrate it into the feedback loop of self-adaptive systems. Finally, to validate the practical applicability of the toolchain, we conducted an extensive user study involving domain experts, whose insights and feedback underscore the toolchain's relevance and usability in real-world scenarios. Our findings demonstrate the toolchain's effectiveness in real-world applications while highlighting areas for future improvements. The toolchain and associated resources are available in an open-source repository to promote reproducibility and encourage further research in this field.

SEJan 29, 2025Code
Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces

Neetha Jambigi, Bartosz Bogacz, Moritz Mueller et al.

Abrupt and unexpected terminations of software are termed as software crashes. They can be challenging to analyze. Finding the root cause requires extensive manual effort and expertise to connect information sources like stack traces, source code, and logs. Typical approaches to fault localization require either test failures or source code. Crashes occurring in production environments, such as that of SAP HANA, provide solely crash logs and stack traces. We present a novel approach to localize faults based only on the stack trace information and no additional runtime information, by fine-tuning large language models (LLMs). We address complex cases where the root cause of a crash differs from the technical cause, and is not located in the innermost frame of the stack trace. As the number of historic crashes is insufficient to fine-tune LLMs, we augment our dataset by leveraging code mutators to inject synthetic crashes into the code base. By fine-tuning on 64,369 crashes resulting from 4.1 million mutations of the HANA code base, we can correctly predict the root cause location of a crash with an accuracy of 66.9\% while baselines only achieve 12.6% and 10.6%. We substantiate the generalizability of our approach by evaluating on two additional open-source databases, SQLite and DuckDB, achieving accuracies of 63% and 74%, respectively. Across all our experiments, fine-tuning consistently outperformed prompting non-finetuned LLMs for localizing faults in our datasets.

SEMay 20, 2024
Naming the Pain in Machine Learning-Enabled Systems Engineering

Marcos Kalinowski, Daniel Mendez, Görkem Giray et al.

Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.

LGDec 18, 2024
On Enhancing Root Cause Analysis with SQL Summaries for Failures in Database Workload Replays at SAP HANA

Neetha Jambigi, Joshua Hammesfahr, Moritz Mueller et al.

Capturing the workload of a database and replaying this workload for a new version of the database can be an effective approach for regression testing. However, false positive errors caused by many factors such as data privacy limitations, time dependency or non-determinism in multi-threaded environment can negatively impact the effectiveness. Therefore, we employ a machine learning based framework to automate the root cause analysis of failures found during replays. However, handling unseen novel issues not found in the training data is one general challenge of machine learning approaches with respect to generalizability of the learned model. We describe how we continue to address this challenge for more robust long-term solutions. From our experience, retraining with new failures is inadequate due to features overlapping across distinct root causes. Hence, we leverage a large language model (LLM) to analyze failed SQL statements and extract concise failure summaries as an additional feature to enhance the classification process. Our experiments show the F1-Macro score improved by 4.77% for our data. We consider our approach beneficial for providing end users with additional information to gain more insights into the found issues and to improve the assessment of the replay results.

SEFeb 15, 2022
Social Science Theories in Software Engineering Research

Tobias Lorey, Paul Ralph, Michael Felderer

As software engineering research becomes more concerned with the psychological, sociological and managerial aspects of software development, relevant theories from reference disciplines are increasingly important for understanding the field's core phenomena of interest. However, the degree to which software engineering research draws on relevant social sciences remains unclear. This study therefore investigates the use of social science theories in five influential software engineering journals over 13 years. It analyzes not only the extent of theory use but also what, how and where these theories are used. While 87 different theories are used, less than two percent of papers use a social science theory, most theories are used in only one paper, most social sciences are ignored, and the theories are rarely tested for applicability to software engineering contexts. Ignoring relevant social science theories may (1) undermine the community's ability to generate, elaborate and maintain a cumulative body of knowledge; and (2) lead to oversimplified models of software engineering phenomena. More attention to theory is needed for software engineering to mature as a scientific discipline.

SEJan 14, 2022
Cognition in Software Engineering: A Taxonomy and Survey of a Half-Century of Research

Fabian Fagerholm, Michael Felderer, Davide Fucci et al.

Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research.

HCDec 1, 2021
Collaborative Artificial Intelligence Needs Stronger Assurances Driven by Risks

Jubril Gbolahan Adigun, Matteo Camilli, Michael Felderer et al.

Collaborative AI systems (CAISs) aim at working together with humans in a shared space to achieve a common goal. This critical setting yields hazardous circumstances that could harm human beings. Thus, building such systems with strong assurances of compliance with requirements, domain-specific standards and regulations is of greatest importance. Only few scale impact has been reported so far for such systems since much work remains to manage possible risks. We identify emerging problems in this context and then we report our vision, as well as the progress of our multidisciplinary research team composed of software/systems, and mechatronics engineers to develop a risk-driven assurance process for CAISs.

SESep 23, 2021
What Makes Agile Software Development Agile?

Marco Kuhrmann, Paolo Tell, Regina Hebig et al.

Together with many success stories, promises such as the increase in production speed and the improvement in stakeholders' collaboration have contributed to making agile a transformation in the software industry in which many companies want to take part. However, driven either by a natural and expected evolution or by contextual factors that challenge the adoption of agile methods as prescribed by their creator(s), software processes in practice mutate into hybrids over time. Are these still agile? In this article, we investigate the question: what makes a software development method agile? We present an empirical study grounded in a large-scale international survey that aims to identify software development methods and practices that improve or tame agility. Based on 556 data points, we analyze the perceived degree of agility in the implementation of standard project disciplines and its relation to used development methods and practices. Our findings suggest that only a small number of participants operate their projects in a purely traditional or agile manner (under 15%). That said, most project disciplines and most practices show a clear trend towards increasing degrees of agility. Compared to the methods used to develop software, the selection of practices has a stronger effect on the degree of agility of a given discipline. Finally, there are no methods or practices that explicitly guarantee or prevent agility. We conclude that agility cannot be defined solely at the process level. Additional factors need to be taken into account when trying to implement or improve agility in a software company. Finally, we discuss the field of software process-related research in the light of our findings and present a roadmap for future research.

SEApr 3, 2021
Aspects of Sustainable Test Processes

Armin Beer, Michael Felderer, Tobias Lorey et al.

Testing is a core software development activity that has huge potential to make software development more sustainable. In this paper, we discuss how environmental, social, economic, and technical sustainability map onto the activities of test planning, design, execution, and evaluation.

SEMar 12, 2021
Towards Risk Modeling for Collaborative AI

Matteo Camilli, Michael Felderer, Andrea Giusti et al.

Collaborative AI systems aim at working together with humans in a shared space to achieve a common goal. This setting imposes potentially hazardous circumstances due to contacts that could harm human beings. Thus, building such systems with strong assurances of compliance with requirements domain specific standards and regulations is of greatest importance. Challenges associated with the achievement of this goal become even more severe when such systems rely on machine learning components rather than such as top-down rule-based AI. In this paper, we introduce a risk modeling approach tailored to Collaborative AI systems. The risk model includes goals, risk events and domain specific indicators that potentially expose humans to hazards. The risk model is then leveraged to drive assurance methods that feed in turn the risk model through insights extracted from run-time evidence. Our envisioned approach is described by means of a running example in the domain of Industry 4.0, where a robotic arm endowed with a visual perception component, implemented with machine learning, collaborates with a human operator for a production-relevant task.

SEMar 2, 2021
Compliance Requirements in Large-Scale Software Development: An Industrial Case Study

Muhammad Usman, Michael Felderer, Michael Unterkalmsteiner et al.

Regulatory compliance is a well-studied area, including research on how to model, check, analyse, enact, and verify compliance of software. However, while the theoretical body of knowledge is vast, empirical evidence on challenges with regulatory compliance, as faced by industrial practitioners particularly in the Software Engineering domain, is still lacking. In this paper, we report on an industrial case study which aims at providing insights into common practices and challenges with checking and analysing regulatory compliance, and we discuss our insights in direct relation to the state of reported evidence. Our study is performed at Ericsson AB, a large telecommunications company, which must comply to both locally and internationally governing regulatory entities and standards such as GDPR. The main contributions of this work are empirical evidence on challenges experienced by Ericsson that complement the existing body of knowledge on regulatory compliance.

SEFeb 10, 2021
Quality Assurance for AI-based Systems: Overview and Challenges

Michael Felderer, Rudolf Ramler

The number and importance of AI-based systems in all domains is growing. With the pervasive use and the dependence on AI-based systems, the quality of these systems becomes essential for their practical usage. However, quality assurance for AI-based systems is an emerging area that has not been well explored and requires collaboration between the SE and AI research communities. This paper discusses terminology and challenges on quality assurance for AI-based systems to set a baseline for that purpose. Therefore, we define basic concepts and characterize AI-based systems along the three dimensions of artifact type, process, and quality characteristics. Furthermore, we elaborate on the key challenges of (1) understandability and interpretability of AI models, (2) lack of specifications and defined requirements, (3) need for validation data and test input generation, (4) defining expected outcomes as test oracles, (5) accuracy and correctness measures, (6) non-functional properties of AI-based systems, (7) self-adaptive and self-learning characteristics, and (8) dynamic and frequently changing environments.

SEFeb 10, 2021
Controlled Experimentation in Continuous Experimentation: Knowledge and Challenges

Florian Auer, Rasmus Ros, Lukas Kaltenbrunner et al.

Context: Continuous experimentation and A/B testing is an established industry practice that has been researched for more than 10 years. Our aim is to synthesize the conducted research. Objective: We wanted to find the core constituents of a framework for continuous experimentation and the solutions that are applied within the field. Finally, we were interested in the challenges and benefits reported of continuous experimentation. Method: We applied forward snowballing on a known set of papers and identified a total of 128 relevant papers. Based on this set of papers we performed two qualitative narrative syntheses and a thematic synthesis to answer the research questions. Results: The framework constituents for continuous experimentation include experimentation processes as well as supportive technical and organizational infrastructure. The solutions found in the literature were synthesized to nine themes, e.g. experiment design, automated experiments, or metric specification. Concerning the challenges of continuous experimentation, the analysis identified cultural, organizational, business, technical, statistical, ethical, and domain-specific challenges. Further, the study concludes that the benefits of experimentation are mostly implicit in the studies. Conclusions: The research on continuous experimentation has yielded a large body of knowledge on experimentation. The synthesis of published research presented within include recommended infrastructure and experimentation process models, guidelines to mitigate the identified challenges, and what problems the various published solutions solve.

SEJan 29, 2021
Catching up with Method and Process Practice: An Industry-Informed Baseline for Researchers

Jil Klünder, Regina Hebig, Paolo Tell et al.

Software development methods are usually not applied by the book. Companies are under pressure to continuously deploy software products that meet market needs and stakeholders' requests. To implement efficient and effective development processes, companies utilize multiple frameworks, methods and practices, and combine these into hybrid methods. A common combination contains a rich management framework to organize and steer projects complemented with a number of smaller practices providing the development teams with tools to complete their tasks. In this paper, based on 732 data points collected through an international survey, we study the software development process use in practice. Our results show that 76.8% of the companies implement hybrid methods. Company size as well as the strategy in devising and evolving hybrid methods affect the suitability of the chosen process to reach company or project goals. Our findings show that companies that combine planned improvement programs with process evolution can increase their process' suitability by up to 5%.

SEDec 25, 2020
Mining user reviews of COVID contact-tracing apps: An exploratory analysis of nine European apps

Vahid Garousi, David Cutting, Michael Felderer

Context: More than 50 countries have developed COVID contact-tracing apps to limit the spread of coronavirus. However, many experts and scientists cast doubt on the effectiveness of those apps. For each app, a large number of reviews have been entered by end-users in app stores. Objective: Our goal is to gain insights into the user reviews of those apps, and to find out the main problems that users have reported. Our focus is to assess the "software in society" aspects of the apps, based on user reviews. Method: We selected nine European national apps for our analysis and used a commercial app-review analytics tool to extract and mine the user reviews. For all the apps combined, our dataset includes 39,425 user reviews. Results: Results show that users are generally dissatisfied with the nine apps under study, except the Scottish ("Protect Scotland") app. Some of the major issues that users have complained about are high battery drainage and doubts on whether apps are really working. Conclusion: Our results show that more work is needed by the stakeholders behind the apps (e.g., app developers, decision-makers, public health experts) to improve the public adoption, software quality and public perception of these apps.

SEOct 7, 2020
Empirical Standards for Software Engineering Research

Paul Ralph, Nauman bin Ali, Sebastian Baltes et al.

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

SESep 30, 2020
Retrieving and mining professional experience of software practice from grey literature: an exploratory review

Austen Rainer, Ashley Williams, Vahid Garousi et al.

Background: Retrieving and mining practitioners' self--reports of their professional experience of software practice could provide valuable evidence for research. We are, however, unaware of any existing reviews of research conducted in this area. Objective: To review and classify previous research, and to identify insights into the challenges research confronts when retrieving and mining practitioners' self-reports of their experience of software practice. Method: We conduct an exploratory review to identify and classify 42 articles. We analyse a selection of those articles for insights on challenges to mining professional experience. Results: We identify only one directly relevant article. Even then this article concerns the software professional's emotional experiences rather than the professional's reporting of behaviour and events occurring during software practice. We discuss challenges concerning: the prevalence of professional experience; definitions, models and theories; the sparseness of data; units of discourse analysis; annotator agreement; evaluation of the performance of algorithms; and the lack of replications. Conclusion: No directly relevant prior research appears to have been conducted in this area. We discuss the value of reporting negative results in secondary studies. There are a range of research opportunities but also considerable challenges. We formulate a set of guiding questions for further research in this area.

SEJul 20, 2020
Why Research on Test-Driven Development is Inconclusive?

Mohammad Ghafari, Timm Gross, Davide Fucci et al.

[Background] Recent investigations into the effects of Test-Driven Development (TDD) have been contradictory and inconclusive. This hinders development teams to use research results as the basis for deciding whether and how to apply TDD. [Aim] To support researchers when designing a new study and to increase the applicability of TDD research in the decision-making process in the industrial context, we aim at identifying the reasons behind the inconclusive research results in TDD. [Method] We studied the state of the art in TDD research published in top venues in the past decade, and analyzed the way these studies were set up. [Results] We identified five categories of factors that directly impact the outcome of studies on TDD. [Conclusions] This work can help researchers to conduct more reliable studies, and inform practitioners of risks they need to consider when consulting research on TDD.

SEMay 26, 2020
Assessing the maturity of software testing services using CMMI-SVC: An industrial case study

Vahid Garousi, Seyfettin Arkan, Gökhan Urul et al.

Context: While many companies conduct their software testing activities in-house, many other companies outsource their software testing needs to other firms who act as software testing service providers. As a result, Testing as a Service (TaaS) has emerged as a strong service industry in the last several decades. In the context of software testing services, there could be various challenges (e.g., during the planning and service delivery phases) and, as a result, the quality of testing services is not always as expected. Objective: It is important, for both providers and also customers of testing services, to assess the quality and maturity of test services and subsequently improve them. Method: Motivated by a real industrial need in the context of several testing service providers, to assess the maturity of their software testing services, we chose the existing CMMI for Services maturity model (CMMI-SVC), and conducted a case study using it in the context of two Turkish testing service providers. Results: The case-study results show that maturity appraisal of testing services using CMMI-SVC was helpful for both companies and their test management teams by enabling them objectively assess the maturity of their testing services and also by pinpointing potential improvement areas. Conclusion: We empirically observed that, after some minor customization, CMMI-SVC is indeed a suitable model for maturity appraisal of testing services.

SEFeb 25, 2020
Software Engineering und Software Engineering Forschung im Zeitalter der Digitalisierung

Michael Felderer, Ralf Reussner, Bernhard Rumpe

Digitization not only affects society, it also requires a redefinition of the location of computer science and computer scientists, as the science journalist Yogeshwar suggests. Since all official aspects of digitalization are based on software, this article is intended to attempt to redefine the role of software engineering and its research. Software-based products, systems or services are influencing all areas of life and are a critical component and central innovation driver of digitization in all areas of life. Scientifically, there are new opportunities and challenges for software engineering as a driving discipline in the development of any technical innovation. However, the chances must not be sacrificed to the competition for bibliometric numbers as an end in themselves.

SEDec 24, 2019
A taxonomy of risk-based testing

Michael Felderer, Ina Schieferdecker

Software testing has often to be done under severe pressure due to limited resources and a challenging time schedule facing the demand to assure the fulfillment of the software requirements. In addition, testing should unveil those software defects that harm the mission-critical functions of the software. Risk-based testing uses risk (re-)assessments to steer all phases of the test process in order to optimize testing efforts and limit risks of the software-based system. Due to its importance and high practical relevance several risk-based testing approaches were proposed in academia and industry. This paper presents a taxonomy of risk-based testing providing a framework to understand, categorize, assess, and compare risk-based testing approaches to support their selection and tailoring for specific purposes. The taxonomy is aligned with the consideration of risks in all phases of the test process and consists of the top-level classes risk drivers, risk assessment, and risk-based test process. The taxonomy of risk-based testing has been developed by analyzing the work presented in available publications on risk-based testing. Afterwards, it has been applied to the work on risk-based testing presented in this special section of the International Journal on Software Tools for Technology Transfer.

SEDec 24, 2019
The Evolution of Empirical Methods in Software Engineering

Michael Felderer, Guilherme Horta Travassos

Empirical methods like experimentation have become a powerful means to drive the field of software engineering by creating scientific evidence on software development, operation, and maintenance, but also by supporting practitioners in their decision making and learning. Today empirical methods are fully applied in software engineering. However, they have developed in several iterations since the 1960s. In this chapter we tell the history of empirical software engineering and present the evolution of empirical methods in software engineering in five iterations, i.e., (1) mid-1960s to mid-1970s, (2) mid-1970s to mid-1980s, (3) mid-1980s to end of the 1990s, (4) the 2000s, and (5) the 2010s. We present the five iterations of the development of empirical software engineering mainly from a methodological perspective and additionally take key papers, venues, and books, which are covered in chronological order in a separate section on recommended further readings, into account. We complement our presentation of the evolution of empirical software engineering by presenting the current situation and an outlook in Sect. 4 and the available books on empirical software engineering. Furthermore, based on the chapters covered in this book we discuss trends on contemporary empirical methods in software engineering related to the plurality of research methods, human factors, data collection and processing, aggregation and synthesis of evidence, and impact of software engineering research.

SENov 27, 2019
Benefitting from the Grey Literature in Software Engineering Research

Vahid Garousi, Michael Felderer, Mika V. Mäntylä et al.

Researchers generally place the most trust in peer-reviewed, published information, such as journals and conference papers. By contrast, software engineering (SE) practitioners typically do not have the time, access or expertise to review and benefit from such publications. As a result, practitioners are more likely to turn to other sources of information that they trust, e.g., trade magazines, online blog-posts, survey results or technical reports, collectively referred to as Grey Literature (GL). Furthermore, practitioners also share their ideas and experiences as GL, which can serve as a valuable data source for research. While GL itself is not a new topic in SE, using, benefitting and synthesizing knowledge from the GL in SE is a contemporary topic in empirical SE research and we are seeing that researchers are increasingly benefitting from the knowledge available within GL. The goal of this chapter is to provide an overview to GL in SE, together with insights on how SE researchers can effectively use and benefit from the knowledge and evidence available in the vast amount of GL.

SESep 27, 2019
Technical Debt and Waste in Non-Functional Requirements Documentation: An Exploratory Study

Gabriela Robiolo, Ezequiel Scott, Santiago Matalonga et al.

Background: To adequately attend to non-functional requirements (NFRs), they must be documented; otherwise, developers would not know about their existence. However, the documentation of NFRs may be subject to Technical Debt and Waste, as any other software artefact. Aims: The goal is to explore indicators of potential Technical Debt and Waste in NFRs documentation. Method: Based on a subset of data acquired from the most recent NaPiRE (Naming the Pain in Requirements Engineering) survey, we calculate, for a standard set of NFR types, how often respondents state they document a specific type of NFR when they also state that it is important. This allows us to quantify the occurrence of potential Technical Debt and Waste. Results: Based on 398 survey responses, four NFR types (Maintainability, Reliability, Usability, and Performance) are labelled as important but they are not documented by more than 22% of the respondents. We interpret that these NFR types have a higher risk of Technical Debt than other NFR types. Regarding Waste, 15% of the respondents state they document NFRs related to Security and they do not consider it important. Conclusions: There is a clear indication that there is a risk of Technical Debt for a fixed set of NFRs since there is a lack of documentation of important NFRs. The potential risk of incurring Waste is also present but to a lesser extent.

SESep 19, 2019
From Monolithic Systems to Microservices: An Assessment Framework

Florian Auer, Valentina Lenarduzzi, Michael Felderer et al.

Context. Re-architecting monolithic systems with Microservices-based architecture is a common trend. Various companies are migrating to Microservices for different reasons. However, making such an important decision like re-architecting an entire system must be based on real facts and not only on gut feelings. Objective. The goal of this work is to propose an evidence-based decision support framework for companies that need to migrate to Microservices, based on the analysis of a set of characteristics and metrics they should collect before re-architecting their monolithic system. Method. We designed this study with a mixed-methods approach combining a Systematic Mapping Study with a survey done in the form of interviews with professionals to derive the assessment framework based on Grounded Theory. Results. We identified a set consisting of information and metrics that companies can use to decide whether to migrate to Microservices or not. The proposed assessment framework, based on the aforementioned metrics, could be useful for companies if they need to migrate to Microservices and do not want to run the risk of failing to consider some important information.

SEAug 16, 2019
Challenges in Survey Research

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer et al.

While being an important and often used research method, survey research has been less often discussed on a methodological level in empirical software engineering than other types of research. This chapter compiles a set of important and challenging issues in survey research based on experiences with several large-scale international surveys. The chapter covers theory building, sampling, invitation and follow-up, statistical as well as qualitative analysis of survey data and the usage of psychometrics in software engineering surveys.

SEMay 31, 2019
Technical Debt in Data-Intensive Software Systems

Harald Foidl, Michael Felderer, Stefan Biffl

The ever-increasing amount, variety as well as generation and processing speed of today's data pose a variety of new challenges for developing Data-Intensive Software Systems (DISS). As with developing other kinds of software systems, developing DISS is often done under severe pressure and strict schedules. Thus, developers of DISS often have to make technical compromises to meet business concerns. This position paper proposes a conceptual model that outlines where Technical Debt (TD) can emerge and proliferate within such data-centric systems by separating a DISS into three parts (Software Systems, Data Storage Systems and Data). Further, the paper illustrates the proliferation of Database Schema Smells as TD items within a relational database-centric software system based on two examples.

SEMay 25, 2019
A Taxonomy to Assess and Tailor Risk-based Testing in Recent Testing Standards

Jürgen Großmann, Michael Felderer, Johannes Viehmann et al.

This article provides a taxonomy for risk-based testing that serves as a tool to define, tailor, or assess risk-based testing approaches in general and to instantiate risk-based testing approaches for the current testing standards ISO/IEC/IEEE 29119, ETSI EG and OWASP Security Testing Guide in particular. We demonstrate the usefulness of the taxonomy by applying it to the aforementioned standards as well as to the risk-based testing approaches SmartTesting, RACOMAT, PRISMA and risk-based test case prioritization using fuzzy expert systems. In this setting, the taxonomy is used to systematically identify deviations between the standards' requirements and the individual testing approaches so that we are able to position and compare the testing approaches and discuss their potential for practical application.

AIApr 20, 2019
Specification-Driven Predictive Business Process Monitoring

Ario Santoso, Michael Felderer

Predictive analysis in business process monitoring aims at forecasting the future information of a running business process. The prediction is typically made based on the model extracted from historical process execution logs (event logs). In practice, different business domains might require different kinds of predictions. Hence, it is important to have a means for properly specifying the desired prediction tasks, and a mechanism to deal with these various prediction tasks. Although there have been many studies in this area, they mostly focus on a specific prediction task. This work introduces a language for specifying the desired prediction tasks, and this language allows us to express various kinds of prediction tasks. This work also presents a mechanism for automatically creating the corresponding prediction model based on the given specification. Differently from previous studies, instead of focusing on a particular prediction task, we present an approach to deal with various prediction tasks based on the given specification of the desired prediction tasks. We also provide an implementation of the approach which is used to conduct experiments using real-life event logs.

SEMar 22, 2019
On Testing Data-Intensive Software Systems

Michael Felderer, Barbara Russo, Florian Auer

Today's software systems like cyber-physical production systems or big data systems have to process large volumes and diverse types of data which heavily influences the quality of these so-called data-intensive systems. However, traditional software testing approaches rather focus on functional behavior than on data aspects. Therefore, the role of data in testing has to be rethought and specific testing approaches for data-intensive software systems are required. Thus, the aim of this chapter is to contribute to this area by (1) providing basic terminology and background on data-intensive software systems and their testing, and (2) presenting the state of the research and the hot topics in the area. Finally, the directions of research and the new frontiers on testing data-intensive software systems are discussed.

SEFeb 1, 2019
Do We Preach What We Practice? Investigating the Practical Relevance of Requirements Engineering Syllabi - The IREB Case

Daniel Méndez Fernández, Xavier Franch, Norbert Seyff et al.

Nowadays, there exist a plethora of different educational syllabi for Requirements Engineering (RE), all aiming at incorporating practically relevant educational units (EUs). Many of these syllabi are based, in one way or the other, on the syllabi provided by the International Requirements Engineering Board (IREB), a non-profit organisation devoted to standardised certification programs for RE. IREB syllabi are developed by RE experts and are, thus, based on the assumption that they address topics of practical relevance. However, little is known about to what extent practitioners actually perceive those contents as useful. We have started a study to investigate the relevance of the EUs included in the IREB Foundation Level certification programme. In a first phase reported in this paper, we have surveyed practitioners mainly from DACH countries (Germany, Austria and Switzerland) participating in the IREB certification. Later phases will widen the scope both by including other countries and by not requiring IREB-certified participants. The results shall foster a critical reflection on the practical relevance of EUs built upon the de-facto standard syllabus of IREB.

SEDec 5, 2018
Closing the gap between software engineering education and industrial needs

Vahid Garousi, Görkem Giray, Eray Tüzün et al.

According to different reports, many recent software engineering graduates often face difficulties when beginning their professional careers, due to misalignment of the skills learnt in their university education with what is needed in industry. To address that need, many studies have been conducted to align software engineering education with industry needs. To synthesize that body of knowledge, we present in this paper a systematic literature review (SLR) which summarizes the findings of 33 studies in this area. By doing a meta-analysis of all those studies and using data from 12 countries and over 4,000 data points, this study will enable educators and hiring managers to adapt their education / hiring efforts to best prepare the software engineering workforce.

SEDec 4, 2018
Verlässliche Software im 21. Jahrhundert

Stefan Wagner, Matthias Tichy, Michael Felderer et al.

Software is the main innovation driver in many different areas, like cloud services, autonomous driving, connected medical devices, and high-frequency trading. All these areas have in common that they require high dependability. In this paper, we discuss challenges and research directions imposed by these new areas on guaranteeing the dependability. On the one hand challenges include characteristics of the systems themselves, e. g., open systems and ad-hoc structures. On the other hand, we see new aspects of dependability like behavioral traceability.

SEJun 2, 2018
NLP-assisted software testing: A systematic mapping of the literature

Vahid Garousi, Sara Bauer, Michael Felderer

Context: To reduce manual effort of extracting test cases from natural-language requirements, many approaches based on Natural Language Processing (NLP) have been proposed in the literature. Given the large amount of approaches in this area, and since many practitioners are eager to utilize such techniques, it is important to synthesize and provide an overview of the state-of-the-art in this area. Objective: Our objective is to summarize the state-of-the-art in NLP-assisted software testing which could benefit practitioners to potentially utilize those NLP-based techniques. Moreover, this can benefit researchers in providing an overview of the research landscape. Method: To address the above need, we conducted a survey in the form of a systematic literature mapping (classification). After compiling an initial pool of 95 papers, we conducted a systematic voting, and our final pool included 67 technical papers. Results: This review paper provides an overview of the contribution types presented in the papers, types of NLP approaches used to assist software testing, types of required input requirements, and a review of tool support in this area. Some key results we have detected are: (1) only four of the 38 tools (11%) presented in the papers are available for download; (2) a larger ratio of the papers (30 of 67) provided a shallow exposure to the NLP aspects (almost no details). Conclusion: This paper would benefit both practitioners and researchers by serving as an "index" to the body of knowledge in this area. The results could help practitioners utilizing the existing NLP-based techniques; this in turn reduces the cost of test-case design and decreases the amount of human resources spent on test activities. After sharing this review with some of our industrial collaborators, initial insights show that this review can indeed be useful and beneficial to practitioners.

SEMay 21, 2018
Status Quo in Requirements Engineering: A Theory and a Global Family of Surveys

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer et al.

Requirements Engineering (RE) has established itself as a software engineering discipline during the past decades. While researchers have been investigating the RE discipline with a plethora of empirical studies, attempts to systematically derive an empirically-based theory in context of the RE discipline have just recently been started. However, such a theory is needed if we are to define and motivate guidance in performing high quality RE research and practice. We aim at providing an empirical and valid foundation for a theory of RE, which helps software engineers establish effective and efficient RE processes. We designed a survey instrument and theory that has now been replicated in 10 countries world-wide. We evaluate the propositions of the theory with bootstrapped confidence intervals and derive potential explanations for the propositions. We report on the underlying theory and the full results obtained from the replication studies with participants from 228 organisations. Our results represent a substantial step forward towards developing an empirically-based theory of RE giving insights into current practices with RE processes. The results reveal, for example, that there are no strong differences between organisations in different countries and regions, that interviews, facilitated meetings and prototyping are the most used elicitation techniques, that requirements are often documented textually, that traces between requirements and code or design documents is common, requirements specifications themselves are rarely changed and that requirements engineering (process) improvement endeavours are mostly intrinsically motivated. Our study establishes a theory that can be used as starting point for many further studies for more detailed investigations. Practitioners can use the results as theory-supported guidance on selecting suitable RE methods and techniques.

SEJan 21, 2018
Recent Results on Classifying Risk-Based Testing Approaches

Michael Felderer, Juergen Grossmann, Ina Schieferdecker

In order to optimize the usage of testing efforts and to assess risks of software-based systems, risk-based testing uses risk (re-)assessments to steer all phases in a test process. Several risk-based testing approaches have been proposed in academia and/or applied in industry, so that the determination of principal concepts and methods in risk-based testing is needed to enable a comparison of the weaknesses and strengths of different risk-based testing approaches. In this chapter we provide an (updated) taxonomy of risk-based testing aligned with risk considerations in all phases of a test process. It consists of three top-level classes, i.e., contextual setup, risk assessment, and risk-based test strategy. This taxonomy provides a framework to understand, categorize, assess and compare risk-based testing approaches to support their selection and tailoring for specific purposes. Furthermore, we position four recent risk-based testing approaches into the taxonomy in order to demonstrate its application and alignment with available risk-based testing approaches.

SEJan 21, 2018
Guidelines for Systematic Mapping Studies in Security Engineering

Michael Felderer, Jeffrey C. Carver

Security engineering in the software lifecycle aims at protecting information and systems to guarantee confidentiality, integrity, and availability. As security engineering matures and the number of research papers grows, there is an increasing need for papers that summarize results and provide an overview of the area. A systematic mapping study "maps" a research area by classifying papers to identify which topics are well-studied and which need additional study. Therefore, systematic mapping studies are becoming increasingly important in security engineering. This chapter provides methodological support for systematic mapping studies in security engineering based on examples from published security engineering papers. Because security engineering is similar to software engineering in that it bridges research and practice, researchers can use the same basic systematic mapping process, as follows: (1) study planning, (2) searching for studies, (3) study selection, (4) study quality assessment, (5) data extraction, (6) data classification, (7) data analysis, and (8) reporting of results. We use published mapping studies to describe the tailoring of this process for security engineering. In addition to guidance on how to perform systematic mapping studies in security engineering, this chapter should increase awareness in the security engineering community of the need for additional mapping studies.

SEJan 7, 2018
A survey on software testability

Vahid Garousi, Michael Felderer, Feyza Nur Kilicaslan

Context: Software testability is the degree to which a software system or a unit under test supports its own testing. To predict and improve software testability, a large number of techniques and metrics have been proposed by both practitioners and researchers in the last several decades. Reviewing and getting an overview of the entire state-of-the-art and state-of-the-practice in this area is often challenging for a practitioner or a new researcher. Objective: Our objective is to summarize the body of knowledge in this area and to benefit the readers (both practitioners and researchers) in preparing, measuring and improving software testability. Method: To address the above need, the authors conducted a survey in the form of a systematic literature mapping (classification) to find out what we as a community know about this topic. After compiling an initial pool of 303 papers, and applying a set of inclusion/exclusion criteria, our final pool included 208 papers. Results: The area of software testability has been comprehensively studied by researchers and practitioners. Approaches for measurement of testability and improvement of testability are the most-frequently addressed in the papers. The two most often mentioned factors affecting testability are observability and controllability. Common ways to improve testability are testability transformation, improving observability, adding assertions, and improving controllability. Conclusion: This paper serves for both researchers and practitioners as an "index" to the vast body of knowledge in the area of testability. The results could help practitioners measure and improve software testability in their projects.

SEJul 9, 2017
Guidelines for including grey literature and conducting multivocal literature reviews in software engineering

Vahid Garousi, Michael Felderer, Mika V. Mäntylä

Context: A Multivocal Literature Review (MLR) is a form of a Systematic Literature Review (SLR) which includes the grey literature (e.g., blog posts and white papers) in addition to the published (formal) literature (e.g., journal and conference papers). MLRs are useful for both researchers and practitioners since they provide summaries both the state-of-the art and -practice in a given area. Objective: There are several guidelines to conduct SLR studies in SE. However, given the facts that several phases of MLRs differ from those of traditional SLRs, for instance with respect to the search process and source quality assessment. Therefore, SLR guidelines are only partially useful for conducting MLR studies. Our goal in this paper is to present guidelines on how to conduct MLR studies in SE. Method: To develop the MLR guidelines, we benefit from three inputs: (1) existing SLR guidelines in SE, (2), a literature survey of MLR guidelines and experience papers in other fields, and (3) our own experiences in conducting several MLRs in SE. All derived guidelines are discussed in the context of three examples MLRs as running examples (two from SE and one MLR from the medical sciences). Results: The resulting guidelines cover all phases of conducting and reporting MLRs in SE from the planning phase, over conducting the review to the final reporting of the review. In particular, we believe that incorporating and adopting a vast set of recommendations from MLR guidelines and experience papers in other fields have enabled us to propose a set of guidelines with solid foundations. Conclusion: Having been developed on the basis of three types of solid experience and evidence, the provided MLR guidelines support researchers to effectively and efficiently conduct new MLRs in any area of SE.

SEJul 1, 2017
On Evidence-based Risk Management in Requirements Engineering

Daniel Méndez Fernández, Michaela Tießler, Marcos Kalinowski et al.

Background: The sensitivity of Requirements Engineering (RE) to the context makes it difficult to efficiently control problems therein, thus, hampering an effective risk management devoted to allow for early corrective or even preventive measures. Problem: There is still little empirical knowledge about context-specific RE phenomena which would be necessary for an effective context- sensitive risk management in RE. Goal: We propose and validate an evidence-based approach to assess risks in RE using cross-company data about problems, causes and effects. Research Method: We use survey data from 228 companies and build a probabilistic network that supports the forecast of context-specific RE phenomena. We implement this approach using spreadsheets to support a light-weight risk assessment. Results: Our results from an initial validation in 6 companies strengthen our confidence that the approach increases the awareness for individual risk factors in RE, and the feedback further allows for disseminating our approach into practice.

SEMar 24, 2017
Requirements Engineering Practice and Problems in Agile Projects: Results from an International Survey

Stefan Wagner, Daniel Méndez Fernández, Michael Felderer et al.

Requirements engineering (RE) is considerably different in agile development than in more traditional development processes. Yet, there is little empirical knowledge on the state of the practice and contemporary problems in agile RE. As part of a bigger survey initiative (Naming the Pain in Requirements Engineering), we build an empirical basis on such aspects of agile RE. Based on the responses of representatives from 92 different organisations, we found that agile RE concentrates on free-text documentation of requirements elicited with a variety of techniques. Often, traces between requirements and code are explicitly managed and also software testing and RE are aligned. Furthermore, continuous improvement of RE is performed due to intrinsic motivation. Important experienced problems include unclear requirements and communication flaws. Overall, we found that most organisations conduct RE in a way we would expect and that agile RE is in several aspects not so different from RE in other development processes.

SEFeb 13, 2017
Supporting Defect Causal Analysis in Practice with Cross-Company Data on Causes of Requirements Engineering Problems

Marcos Kalinowski, Pablo Curty, Aline Paes et al.

[Context] Defect Causal Analysis (DCA) represents an efficient practice to improve software processes. While knowledge on cause-effect relations is helpful to support DCA, collecting cause-effect data may require significant effort and time. [Goal] We propose and evaluate a new DCA approach that uses cross-company data to support the practical application of DCA. [Method] We collected cross-company data on causes of requirements engineering problems from 74 Brazilian organizations and built a Bayesian network. Our DCA approach uses the diagnostic inference of the Bayesian network to support DCA sessions. We evaluated our approach by applying a model for technology transfer to industry and conducted three consecutive evaluations: (i) in academia, (ii) with industry representatives of the Fraunhofer Project Center at UFBA, and (iii) in an industrial case study at the Brazilian National Development Bank (BNDES). [Results] We received positive feedback in all three evaluations and the cross-company data was considered helpful for determining main causes. [Conclusions] Our results strengthen our confidence in that supporting DCA with cross-company data is promising and should be further investigated.