Michael Kläs

h-index16

17papers

382citations

Novelty30%

AI Score24

Ranked #176,394 of 201,326 authors (top 88%)#2,388 in SE (top 70%)

17 Papers

LGJul 10, 2023Code

Badgers: generating data quality deficits with Python

Julien Siebert, Daniel Seifert, Patricia Kelbert et al.

Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgers

SEJun 14, 2022

Architectural patterns for handling runtime uncertainty of data-driven models in safety-critical perception

Janek Groß, Rasmus Adler, Michael Kläs et al.

Data-driven models (DDM) based on machine learning and other AI techniques play an important role in the perception of increasingly autonomous systems. Due to the merely implicit definition of their behavior mainly based on the data used for training, DDM outputs are subject to uncertainty. This poses a challenge with respect to the realization of safety-critical perception tasks by means of DDMs. A promising approach to tackling this challenge is to estimate the uncertainty in the current situation during operation and adapt the system behavior accordingly. In previous work, we focused on runtime estimation of uncertainty and discussed approaches for handling uncertainty estimations. In this paper, we present additional architectural patterns for handling uncertainty. Furthermore, we evaluate the four patterns qualitatively and quantitatively with respect to safety and performance gains. For the quantitative evaluation, we consider a distance controller for vehicle platooning where performance gains are measured by considering how much the distance can be reduced in different operational situations. We conclude that the consideration of context information of the driving situation makes it possible to accept more or less uncertainty depending on the inherent risk of the situation, which results in performance gains.

LGNov 9, 2023

Uncertainty Wrapper in the medical domain: Establishing transparent uncertainty quantification for opaque machine learning models in practice

Lisa Jöckel, Michael Kläs, Georg Popp et al.

When systems use data-based models that are based on machine learning (ML), errors in their results cannot be ruled out. This is particularly critical if it remains unclear to the user how these models arrived at their decisions and if errors can have safety-relevant consequences, as is often the case in the medical field. In such cases, the use of dependable methods to quantify the uncertainty remaining in a result allows the user to make an informed decision about further usage and draw possible conclusions based on a given result. This paper demonstrates the applicability and practical utility of the Uncertainty Wrapper using flow cytometry as an application from the medical field that can benefit from the use of ML models in conjunction with dependable and transparent uncertainty quantification.

SENov 28, 2016Code

Operationalised product quality models and assessment: The Quamoco approach

Stefan Wagner, Andreas Goeb, Lars Heinemann et al.

Software quality models provide either abstract quality characteristics or concrete quality measurements; there is no seamless integration of these two aspects. Reasons for this include the complexity of quality and the various quality profiles in different domains which make it difficult to build operationalised quality models. In the project Quamoco, we developed a comprehensive approach for closing this gap. It combined constructive research, which involved quality experts from academia and industry in workshops, sprint work and reviews, with empirical studies. All deliverables within the project were peer-reviewed by two project members from a different area. Most deliverables were developed in two or three iterations and underwent an evaluation. We contribute a comprehensive quality modelling and assessment approach: (1) A meta quality model defines the structure of operationalised quality models. It includes the concept of a product factor, which bridges the gap between concrete measurements and abstract quality aspects, and allows modularisation to create modules for specific domains. (2) A largely technology-independent base quality model reduces the effort and complexity of building quality models for specific domains. For Java and C# systems, we refined it with about 300 concrete product factors and 500 measures. (3) A concrete and comprehensive quality assessment approach makes use of the concepts in the meta-model. (4) An empirical evaluation of the above results using real-world software systems. (5) The extensive, open-source tool support is in a mature state. (6) The model for embedded software systems is a proof-of-concept for domain-specific quality models. We provide a broad basis for the development and application of quality models in industrial practice as well as a basis for further extension, validation and comparison with other approaches in research.

SEDec 8, 2023

Operationalizing Assurance Cases for Data Scientists: A Showcase of Concepts and Tooling in the Context of Test Data Quality for Machine Learning

Lisa Jöckel, Michael Kläs, Janek Groß et al.

Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way. In the context of quality assurance for Machine Learning (ML)-based software components, ACs are also being discussed and appear promising. Tools for operationalizing ACs do exist, yet mainly focus on supporting safety engineers on the system level. However, assuring the quality of an ML component within the system is commonly the responsibility of data scientists, who are usually less familiar with these tools. To address this gap, we propose a framework to support the operationalization of ACs for ML components based on technologies that data scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to make the process of creating ML-related evidence in ACs more effective. Results from the application of the framework, documented through notebooks, can be integrated into existing AC tools. We illustrate the application of the framework on an example excerpt concerned with the quality of the test data.

LGMay 24, 2023

Timeseries-aware Uncertainty Wrappers for Uncertainty Quantification of Information-Fusion-Enhanced AI Models based on Machine Learning

Janek Groß, Michael Kläs, Lisa Jöckel et al.

As the use of Artificial Intelligence (AI) components in cyber-physical systems is becoming more common, the need for reliable system architectures arises. While data-driven models excel at perception tasks, model outcomes are usually not dependable enough for safety-critical applications. In this work,we present a timeseries-aware uncertainty wrapper for dependable uncertainty estimates on timeseries data. The uncertainty wrapper is applied in combination with information fusion over successive model predictions in time. The application of the uncertainty wrapper is demonstrated with a traffic sign recognition use case. We show that it is possible to increase model accuracy through information fusion and additionally increase the quality of uncertainty estimates through timeseries-aware input quality features.

AIFeb 10, 2022

Integrating Testing and Operation-related Quantitative Evidences in Assurance Cases to Argue Safety of Data-Driven AI/ML Components

Michael Kläs, Lisa Jöckel, Rasmus Adler et al.

In the future, AI will increasingly find its way into systems that can potentially cause physical harm to humans. For such safety-critical systems, it must be demonstrated that their residual risk does not exceed what is acceptable. This includes, in particular, the AI components that are part of such systems' safety-related functions. Assurance cases are an intensively discussed option today for specifying a sound and comprehensive safety argument to demonstrate a system's safety. In previous work, it has been suggested to argue safety for AI components by structuring assurance cases based on two complementary risk acceptance criteria. One of these criteria is used to derive quantitative targets regarding the AI. The argumentation structures commonly proposed to show the achievement of such quantitative targets, however, focus on failure rates from statistical testing. Further important aspects are only considered in a qualitative manner -- if at all. In contrast, this paper proposes a more holistic argumentation structure for having achieved the target, namely a structure that integrates test results with runtime aspects and the impact of scope compliance and test data quality in a quantitative manner. We elaborate different argumentation options, present the underlying mathematical considerations, and discuss resulting implications for their practical application. Using the proposed argumentation structure might not only increase the integrity of assurance cases but may also allow claims on quantitative targets that would not be justifiable otherwise.

LGJan 10, 2022

A Study on Mitigating Hard Boundaries of Decision-Tree-based Uncertainty Estimates for AI Models

Pascal Gerber, Lisa Jöckel, Michael Kläs

Outcomes of data-driven AI models cannot be assumed to be always correct. To estimate the uncertainty in these outcomes, the uncertainty wrapper framework has been proposed, which considers uncertainties related to model fit, input quality, and scope compliance. Uncertainty wrappers use a decision tree approach to cluster input quality related uncertainties, assigning inputs strictly to distinct uncertainty clusters. Hence, a slight variation in only one feature may lead to a cluster assignment with a significantly different uncertainty. Our objective is to replace this with an approach that mitigates hard decision boundaries of these assignments while preserving interpretability, runtime complexity, and prediction performance. Five approaches were selected as candidates and integrated into the uncertainty wrapper framework. For the evaluation based on the Brier score, datasets for a pedestrian detection use case were generated using the CARLA simulator and YOLOv3. All integrated approaches achieved a softening, i.e., smoothing, of uncertainty estimation. Yet, compared to decision trees, they are not so easy to interpret and have higher runtime complexity. Moreover, some components of the Brier score impaired while others improved. Most promising regarding the Brier score were random forests. In conclusion, softening hard decision tree boundaries appears to be a trade-off decision.

SEOct 27, 2021

From Complexity Measurement to Holistic Quality Evaluation for Automotive Software Development

Jens Heidrich, Michael Kläs, Andreas Morgenstern et al.

In recent years, the role and the importance of software in the automotive domain have changed dramatically. Being able to systematically evaluate and manage software quality is becoming even more crucial. In practice, however, we still find a largely static approach for measuring software quality based on a predefined list of complexity metrics with static thresholds to fulfill. We propose using a more flexible framework instead, which systematically derives measures and evaluation rules based on the goals and context of a development project.

SEAug 31, 2021

Towards a Common Testing Terminology for Software Engineering and Data Science Experts

Lisa Jöckel, Thomas Bauer, Michael Kläs et al.

Analytical quality assurance, especially testing, is an integral part of software-intensive system development. With the increased usage of Artificial Intelligence (AI) and Machine Learning (ML) as part of such systems, this becomes more difficult as well-understood software testing approaches cannot be applied directly to the AI-enabled parts of the system. The required adaptation of classical testing approaches and the development of new concepts for AI would benefit from a deeper understanding and exchange between AI and software engineering experts. We see the different terminologies used in the two communities as a major obstacle on this way. As we consider a mutual understanding of the testing terminology a key, this paper contributes a mapping between the most important concepts from classical software testing and AI testing. In the mapping, we highlight differences in the relevance and naming of the mapped concepts.

LGNov 28, 2018

Towards Identifying and Managing Sources of Uncertainty in AI and Machine Learning Models - An Overview

Michael Kläs

Quantifying and managing uncertainties that occur when data-driven models such as those provided by AI and machine learning methods are applied is crucial. This whitepaper provides a brief motivation and first overview of the state of the art in identifying and quantifying sources of uncertainty for data-driven components as well as means for analyzing their impact.

SENov 14, 2016

The Quamoco Product Quality Modelling and Assessment Approach

Stefan Wagner, Klaus Lochmann, Lars Heinemann et al.

Published software quality models either provide abstract quality attributes or concrete quality assessments. There are no models that seamlessly integrate both aspects. In the project Quamoco, we built a comprehensive approach with the aim to close this gap. For this, we developed in several iterations a meta quality model specifying general concepts, a quality base model covering the most important quality factors and a quality assessment approach. The meta model introduces the new concept of a product factor, which bridges the gap between concrete measurements and abstract quality aspects. Product factors have measures and instruments to operationalise quality by measurements from manual inspection and tool analysis. The base model uses the ISO 25010 quality attributes, which we refine by 200 factors and 600 measures for Java and C# systems. We found in several empirical validations that the assessment results fit to the expectations of experts for the corresponding systems. The empirical analyses also showed that several of the correlations are statistically significant and that the maintainability part of the base model has the highest correlation, which fits to the fact that this part is the most comprehensive. Although we still see room for extending and improving the base model, it shows a high correspondence with expert opinions and hence is able to form the basis for repeatable and understandable quality assessments in practice.

SEMar 21, 2014

Comprehensive Landscapes for Software-related Quality Models

Michael Kläs, Jens Heidrich, Jürgen Münch et al.

Managing quality (such as service availability or process adherence) during the development, operation, and maintenance of software(-intensive) systems and services is a challenging task. Although many organizations need to define, control, measure, and improve various quality aspects of their devel- opment artifacts and processes, nearly no guidance is available on how to select, adapt, define, combine, use, and evolve quality models. Catalogs of quality models as well as selection and tailoring processes are widely missing. One essential reason for this tremendous lack of support is that software development is a highly context-dependent process. Therefore, quality models always need to be adaptable to the respective project goals and contexts. A first step towards better support for selecting and adapting quality models can be seen in a classification of existing quality models, especially with respect to their suitability for different purposes and contexts. Such a classification of quality models can be applied to provide an integrated overview of the variety of quality models. This article presents the idea of so called comprehensive quality model landscapes (CQMLs), which provide a classification scheme for quality models and help to get an overview of existing quality models and their relationships. The article describes the usage goals for such landscapes, presents a classification scheme, presents the initial concept of such landscapes, illustrates the concept with selected examples, and sketches open questions and future work.

SEJan 13, 2014

Predicting Defect Content and Quality Assurance Effectiveness by Combining Expert Judgment and Defect Data - A Case Study

Michael Kläs, Haruka Nakao, Frank Elberzhager et al.

Planning quality assurance (QA) activities in a systematic way and controlling their execution are challenging tasks for companies that develop software or software-intensive systems. Both require estimation capabilities regarding the effectiveness of the applied QA techniques and the defect content of the checked artifacts. Existing approaches for these purposes need extensive measurement data from historical projects. Due to the fact that many companies do not collect enough data for applying these approaches (especially for the early project lifecycle), they typically base their QA planning and controlling solely on expert opinion. This article presents a hybrid method that combines commonly available measurement data and context-specific expert knowledge. To evaluate the method's applicability and usefulness, we conducted a case study in the context of independent verification and validation activities for critical software in the space domain. A hybrid defect content and effectiveness model was developed for the software requirements analysis phase and evaluated with available legacy data. One major result is that the hybrid model provides improved estimation accuracy when compared to applicable models based solely on data. The mean magnitude of relative error (MMRE) determined by cross-validation is 29.6% compared to 76.5% obtained by the most accurate data-based model.

SEJan 9, 2014

Model-based Product Quality Evaluation with Multi-Criteria Decision Analysis

Adam Trendowicz, Michael Kläs, Constanza Lampasona et al.

The ability to develop or evolve software or software-based systems/services with defined and guaranteed quality in a predictable way is becoming increasingly important. Essential - though not exclusive - prerequisites for this are the ability to model the relevant quality properties appropriately and the capability to perform reliable quality evaluations. Existing approaches for integrated quality modeling and evaluation are typically either narrowly focused or too generic and have proprietary ways for modeling and evaluating quality. This article sketches an ap- proach for modeling and evaluating quality properties in a uniform way, without losing the ability to build sufficiently detailed customized models for specific quality properties. The focus of this article is on the description of a multi-criteria aggregation mechanism that can be used for the evaluation. In addition, the underlying quality meta-model, an example application scenario, related work, initial application results, and an outlook on future research are presented.

SEJan 7, 2014

Transparent Combination of Expert and Measurement Data for Defect Prediction: An Industrial Case Study

Michael Kläs, Frank Elberzhager, Jürgen Münch et al.

Defining strategies on how to perform quality assurance (QA) and how to control such activities is a challenging task for organizations developing or maintaining software and software-intensive systems. Planning and adjusting QA activities could benefit from accurate estimations of the expected defect content of relevant artifacts and the effectiveness of important quality assurance activities. Combining expert opinion with commonly available measurement data in a hybrid way promises to overcome the weaknesses of purely data-driven or purely expert-based estimation methods. This article presents a case study of the hybrid estimation method HyDEEP for estimating defect content and QA effectiveness in the telecommunication domain. The specific focus of this case study is the use of the method for gaining quantitative predictions. This aspect has not been empirically analyzed in previous work. Among other things, the results show that for defect content estimation, the method performs significantly better statistically than purely data-based methods, with a relative error of 0.3 on average (MMRE).

SEDec 4, 2013

Adapting Software Quality Models: Practical Challenges, Approach, and First Empirical Results

Michael Kläs, Constanza Lampasona, Jürgen Münch

Measuring and evaluating software quality has become a fundamental task. Many models have been proposed to support stakeholders in dealing with software quality. However, in most cases, quality models do not fit perfectly for the target application context. Since approaches for efficiently adapting quality models are largely missing, many quality models in practice are built from scratch or reuse only high-level concepts of existing models. We present a tool-supported approach for the efficient adaptation of quality models. An initial empirical investigation indicates that the quality models obtained applying the proposed approach are considerably more consistently and appropriately adapted than those obtained following an ad-hoc approach. Further, we could observe that model adaptation is significantly more efficient (~factor 8) when using this approach.