SEJan 21, 2023
The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and PracticeMonika Steidl, Michael Felderer, Rudolf Ramler
Companies struggle to continuously develop and deploy AI models to complex production systems due to AI characteristics while assuring quality. To ease the development process, continuous pipelines for AI have become an active research area where consolidated and in-depth analysis regarding the terminology, triggers, tasks, and challenges is required. This paper includes a Multivocal Literature Review where we consolidated 151 relevant formal and informal sources. In addition, nine-semi structured interviews with participants from academia and industry verified and extended the obtained information. Based on these sources, this paper provides and compares terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle management, and CD4ML. Furthermore, the paper provides an aggregated list of potential triggers for reiterating the pipeline, such as alert systems or schedules. In addition, this work uses a taxonomy creation strategy to present a consolidated pipeline comprising tasks regarding the continuous development of AI. This pipeline consists of four stages: Data Handling, Model Learning, Software Development and System Operations. Moreover, we map challenges regarding pipeline implementation, adaption, and usage for the continuous development of AI to these four stages.
SEMar 19, 2022
Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based SystemsHarald Foidl, Michael Felderer, Rudolf Ramler
High data quality is fundamental for today's AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-based systems (e.g., technical debt, data-induced faults). As a counterpart to code smells in software engineering, we refer to such issues as Data Smells. This article conceptualizes data smells and elaborates on their causes, consequences, detection, and use in the context of AI-based systems. In addition, a catalogue of 36 data smells divided into three categories (i.e., Believability Smells, Understandability Smells, Consistency Smells) is presented. Moreover, the article outlines tool support for detecting data smells and presents the result of an initial smell detection on more than 240 real-world datasets.
48.7CRMar 16
Comparative Analysis of SRAM PUF Temperature Susceptibility on Embedded SystemsMartina Zeinzinger, Josef Langer, Florian Eibensteiner et al.
An SRAM Physical Unclonable Function (PUF) can distinguish SRAM modules by analyzing the inherent randomness of their start-up behavior. However, the effectiveness of this technique varies depending on the design and fabrication of the SRAM module. This study compares two similar microcontrollers, both equipped with on-chip SRAM, to determine which device produces a better SRAM PUF. Both microcontrollers are programmed with an identical SRAM PUF authentication routine and tested under varying ambient temperatures (ranging from 10 °C to 50 °C) to evaluate the impact of temperature on SRAM PUF performance. One embedded SRAM works significantly better than the other, even though the two models are closely related. The presented results can be used early in the design process to compare arbitrary on-chip SRAM models and see which is best suited for implementing an SRAM PUF.
9.0CRMar 11
An Approach for Safe and Secure Software Protection Supported by Symbolic ExecutionDaniel Dorfmeister, Flavio Ferrarotti, Bernhard Fischer et al.
We introduce a novel copy-protection method for industrial control software. With our method, a program executes correctly only on its target hardware and behaves differently on other machines. The hardware-software binding is based on Physically Unclonable Functions (PUFs). We use symbolic execution to guarantee the preservation of safety properties if the software is executed on a different machine, or if there is a problem with the PUF response. Moreover, we show that the protection method is also secure against reverse engineering.
SEFeb 10, 2021
Quality Assurance for AI-based Systems: Overview and ChallengesMichael Felderer, Rudolf Ramler
The number and importance of AI-based systems in all domains is growing. With the pervasive use and the dependence on AI-based systems, the quality of these systems becomes essential for their practical usage. However, quality assurance for AI-based systems is an emerging area that has not been well explored and requires collaboration between the SE and AI research communities. This paper discusses terminology and challenges on quality assurance for AI-based systems to set a baseline for that purpose. Therefore, we define basic concepts and characterize AI-based systems along the three dimensions of artifact type, process, and quality characteristics. Furthermore, we elaborate on the key challenges of (1) understandability and interpretability of AI models, (2) lack of specifications and defined requirements, (3) need for validation data and test input generation, (4) defining expected outcomes as test oracles, (5) accuracy and correctness measures, (6) non-functional properties of AI-based systems, (7) self-adaptive and self-learning characteristics, and (8) dynamic and frequently changing environments.
SEDec 17, 2019
Probabilistic Software Modeling: A Data-driven Paradigm for Software AnalysisHannes Thaller, Lukas Linsbauer, Rudolf Ramler et al.
Software systems are complex, and behavioral comprehension with the increasing amount of AI components challenges traditional testing and maintenance strategies.The lack of tools and methodologies for behavioral software comprehension leaves developers to testing and debugging that work in the boundaries of known scenarios. We present Probabilistic Software Modeling (PSM), a data-driven modeling paradigm for predictive and generative methods in software engineering. PSM analyzes a program and synthesizes a network of probabilistic models that can simulate and quantify the original program's behavior. The approach extracts the type, executable, and property structure of a program and copies its topology. Each model is then optimized towards the observed runtime leading to a network that reflects the system's structure and behavior. The resulting network allows for the full spectrum of statistical inferential analysis with which rich predictive and generative applications can be built. Applications range from the visualization of states, inferential queries, test case generation, and anomaly detection up to the stochastic execution of the modeled system. In this work, we present the modeling methodologies, an empirical study of the runtime behavior of software systems, and a comprehensive study on PSM modeled systems. Results indicate that PSM is a solid foundation for structural and behavioral software comprehension applications.
CRNov 15, 2019
Integrating Threat Modeling and Automated Test Case Generation into Industrialized Software Security TestingStefan Marksteiner, Rudolf Ramler, Hannes Sochor
Industrial Internet of Things (IIoT) application provide a whole new set of possibilities to drive efficiency of industrial production forward. However, with the higher degree of integration among systems, comes a plethora of newthreats to the latter, as they are not yet designed to be broadly reachable and interoperable. To mitigate these vast amount of new threats, systematic and automated test methods are necessary. This comprehensiveness can be achieved by thorough threat modeling. In order to automate security test, we present an approach to automate the testing process from threat modeling onward, closing the gap between threat modeling and automated test case generation.
SEJun 13, 2017
Exploring Code Clones in Programmable Logic Controller SoftwareHannes Thaller, Rudolf Ramler, Josef Pichler et al.
The reuse of code fragments by copying and pasting is widely practiced in software development and results in code clones. Cloning is considered an anti-pattern as it negatively affects program correctness and increases maintenance efforts. Programmable Logic Controller (PLC) software is no exception in the code clone discussion as reuse in development and maintenance is frequently achieved through copy, paste, and modification. Even though the presence of code clones may not necessary be a problem per se, it is important to detect, track and manage clones as the software system evolves. Unfortunately, tool support for clone detection and management is not commonly available for PLC software systems or limited to generic tools with a reduced set of features. In this paper, we investigate code clones in a real-world PLC software system based on IEC 61131-3 Structured Text and C/C++. We extended a widely used tool for clone detection with normalization support. Furthermore, we evaluated the different types and natures of code clones in the studied system and their relevance for refactoring. Results shed light on the applicability and usefulness of clone detection in the context of industrial automation systems and it demonstrates the benefit of adapting detection and management tools for IEC 611313-3 languages.