Thorsten Berger

SE
h-index12
14papers
393citations
Novelty34%
AI Score47

14 Papers

SEAug 12, 2024Code
A Large-Scale Study of Model Integration in ML-Enabled Software Systems

Yorick Sens, Henriette Knopp, Sven Peldszus et al.

The rise of machine learning (ML) and its integration into software systems has drastically changed development practices. While software engineering traditionally focused on manually created code artifacts with dedicated processes and architectures, ML-enabled systems require additional data-science methods and tools to create ML artifacts -- especially ML models and training data. However, integrating models into systems, and managing the many different artifacts involved, is far from trivial. ML-enabled systems can easily have multiple ML models that interact with each other and with traditional code in intricate ways. Unfortunately, while challenges and practices of building ML-enabled systems have been studied, little is known about the characteristics of real-world ML-enabled systems beyond isolated examples. Improving engineering processes and architectures for ML-enabled systems requires improving the empirical understanding of these systems. We present a large-scale study of 2,928 open-source ML-enabled software systems. We classified and analyzed them to determine system characteristics, model and code reuse practices, and architectural aspects of integrating ML models. Our findings show that these systems still mainly consist of traditional source code, and that ML model reuse through code duplication or pre-trained models is common. We also identified different ML integration patterns and related implementation practices. We hope that our results help improve practices for integrating ML models, bringing data science and software engineering closer together.

CRJan 22
120 Domain-Specific Languages for Security

Markus Krausz, Sven Peldszus, Francesco Regazzoni et al.

Security engineering, from security requirements engineering to the implementation of cryptographic protocols, is often supported by domain-specific languages (DSLs). Unfortunately, a lack of knowledge about these DSLs, such as which security aspects are addressed and when, hinders their effective use and further research. This systematic literature review examines 120 security-oriented DSLs based on six research questions concerning security aspects and goals, language-specific characteristics, integration into the software development lifecycle (SDLC), and effectiveness of the DSLs. We observe a high degree of fragmentation, which leads to opportunities for integration. We also need to improve the usability and evaluation of security DSLs.

HCDec 22, 2020Code
A Maturity Assessment Framework for Conversational AI Development Platforms

Johan Aronsson, Philip Lu, Daniel Strüber et al.

Conversational Artificial Intelligence (AI) systems have recently sky-rocketed in popularity and are now used in many applications, from car assistants to customer support. The development of conversational AI systems is supported by a large variety of software platforms, all with similar goals, but different focus points and functionalities. A systematic foundation for classifying conversational AI platforms is currently lacking. We propose a framework for assessing the maturity level of conversational AI development platforms. Our framework is based on a systematic literature review, in which we extracted common and distinguishing features of various open-source and commercial (or in-house) platforms. Inspired by language reference frameworks, we identify different maturity levels that a conversational AI development platform may exhibit in understanding and responding to user inputs. Our framework can guide organizations in selecting a conversational AI development platform according to their needs, as well as helping researchers and platform developers improving the maturity of their platforms.

ROOct 13, 2020Code
Behavior Trees in Action: A Study of Robotics Applications

Razan Ghzouli, Thorsten Berger, Einar Broch Johnsen et al.

Autonomous robots combine a variety of skills to form increasingly complex behaviors called missions. While the skills are often programmed at a relatively low level of abstraction, their coordination is architecturally separated and often expressed in higher-level languages or frameworks. Recently, the language of Behavior Trees gained attention among roboticists for this reason. Originally designed for computer games to model autonomous actors, Behavior Trees offer an extensible tree-based representation of missions. However, even though, several implementations of the language are in use, little is known about its usage and scope in the real world. How do behavior trees relate to traditional languages for describing behavior? How are behavior tree concepts used in applications? What are the benefits of using them? We present a study of the key language concepts in Behavior Trees and their use in real-world robotic applications. We identify behavior tree languages and compare their semantics to the most well-known behavior modeling languages: state and activity diagrams. We mine open source repositories for robotics applications that use the language and analyze this usage. We find that Behavior Trees are a pragmatic language, not fully specified, allowing projects to extend it even for just one model. Behavior trees clearly resemble the models-at-runtime paradigm. We contribute a dataset of real-world behavior models, hoping to inspire the community to use and further develop this language, associated tools, and analysis techniques.

17.2CRMay 8
Can I Check What I Designed? Mapping Security Design DSLs to Code Analyzers

Sven Peldszus, Frederik Reiche, Kevin Hermann et al.

When assessing the potential impact of code-level vulnerabilities, e.g., discovered by automated analyzers, it is essential to consider them in the context of the system's security design. However, this is a challenging task due to the abstraction gap between security design, often specified using security DSLs, and implementation. As we will show, even security experts lack a complete understanding of this relationship. Intrigued by this gap (and the general disconnect between secure design and secure implementation) we present a study of 66 design-level security DSLs and 559 security checks from 36 code-level analyzers. We identify what concepts are common to both and capture them in the SecLan model, which has been validated by 22 security experts. Based on this, we investigate the relationship between DSLs and analyzers quantitatively and explore it qualitatively together with 9 security experts. We learn that there are few commonalities between design-level and implementation-level security; security checks are often described by overly general weaknesses, resulting in many non-obvious potential relationships between security DSLs and analyzers; and even security experts are overwhelmed by this complexity. We provide an empirical basis that helps practitioners and researchers better understand the gap and serves as a first step toward bridging it.

SEJul 29, 2025
Vibe Coding as a Reconfiguration of Intent Mediation in Software Development: Definition, Implications, and Research Agenda

Christian Meske, Tobias Hermanns, Esther von der Weiden et al.

Software development is undergoing a fundamental transformation as vibe coding becomes widespread, with large portions of contemporary codebases now being AI-generated. The disconnect between rapid adoption and limited conceptual understanding highlights the need for an inquiry into this emerging paradigm. Drawing on an intent perspective and historical analysis, we define vibe coding as a software development paradigm where humans and generative AI engage in collaborative flow to co-create software artifacts through natural language dialogue, shifting the mediation of developer intent from deterministic instruction to probabilistic inference. By intent mediation, we refer to the fundamental process through which developers translate their conceptual goals into representations that computational systems can execute. Our results show that vibe coding reconfigures cognitive work by redistributing epistemic labor between humans and machines, shifting the expertise in the software development process away from traditional areas such as design or technical implementation toward collaborative orchestration. We identify key opportunities, including democratization, acceleration, and systemic leverage, alongside risks, such as black box codebases, responsibility gaps, and ecosystem bias. We conclude with a research agenda spanning human-, technology-, and organization-centered directions to guide future investigations of this paradigm.

SEDec 2, 2021
A Generator Framework For Evolving Variant-Rich Software

Christoph Derks, Daniel Strüber, Thorsten Berger

Evolving software is challenging, even more when it exists in many different variants. Such software evolves not only in time, but also in space--another dimension of complexity. While evolution in space is supported by a variety of product-line and variability management tools, many of which originating from research, their level of evaluation varies significantly, which threatens their relevance for practitioners and future research. Many tools have only been evaluated on ad hoc datasets, minimal examples or available preprocessor-based product lines, missing the early clone & own phases and the re-engineering into configurable platforms--large parts of the actual evolution lifecycle of variant-rich systems. Our long-term goal is to provide benchmarks to increase the maturity of evaluating such tools. However, providing manually curated benchmarks that cover the whole evolution lifecycle and that are detailed enough to serve as ground truths, is challenging. We present the framework vpbench to generates source-code histories of variant-rich systems. Vpbench comprises several modular generators relying on evolution operators that systematically and automatically evolve real codebases and document the evolution in detail. We provide simple and more advanced generators--e.g., relying on code transplantation techniques to obtain whole features from external, real-world projects. We define requirements and demonstrate how vpbench addresses them for the generated version histories, focusing on support for evolution in time and space, the generation of detailed meta-data about the evolution, also considering compileability and extensibility.

SEAug 18, 2021
Towards Mapping Control Theory and Software Engineering Properties using Specification Patterns

Ricardo Caldas, Razan Ghzouli, Alessandro V. Papadopoulos et al.

A traditional approach to realize self-adaptation in software engineering (SE) is by means of feedback loops. The goals of the system can be specified as formal properties that are verified against models of the system. On the other hand, control theory (CT) provides a well-established foundation for designing feedback loop systems and providing guarantees for essential properties, such as stability, settling time, and steady state error. Currently, it is an open question whether and how traditional SE approaches to self-adaptation consider properties from CT. Answering this question is challenging given the principle differences in representing properties in both fields. In this paper, we take a first step to answer this question. We follow a bottom up approach where we specify a control design (in Simulink) for a case inspired by Scuderia Ferrari (F1) and provide evidence for stability and safety. The design is then transferred into code (in C) that is further optimized. Next, we define properties that enable verifying whether the control properties still hold at code level. Then, we consolidate the solution by mapping the properties in both worlds using specification patterns as common language and we verify the correctness of this mapping. The mapping offers a reusable artifact to solve similar problems. Finally, we outline opportunities for future work, particularly to refine and extend the mapping and investigate how it can improve the engineering of self-adaptive systems for both SE and CT engineers.

SEApr 13, 2021
Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Mukelabai Mukelabai, Stefan Strüder, Daniel Strüber et al.

Several software defect prediction techniques have been developed over the past decades. These techniques predict defects at the granularity of typical software assets, such as components and files. In this paper, we investigate feature-oriented defect prediction: predicting defects at the granularity of features -- domain-entities that represent software functionality and often cross-cut software assets. Feature-oriented defect prediction can be beneficial since: (i) some features might be more error-prone than others, (ii) characteristics of defective features might be useful to predict other error-prone features, and (iii) feature-specific code might be prone to faults arising from feature interactions. We explore the feasibility and solution space for feature-oriented defect prediction. Our study relies on 12 software projects from which we analyzed 13,685 bug-introducing and corrective commits, and systematically generated 62,868 training and test datasets to evaluate classifiers, metrics, and scenarios. The datasets were generated based on the 13,685 commits, 81 releases, and 24, 532 permutations of our 12 projects depending on the scenario addressed. We covered scenarios such as just-in-time (JIT) and cross-project defect prediction. Our results confirm the feasibility of feature-oriented defect prediction. We found the best performance (i.e., precision and robustness) when using the Random Forest classifier, with process and structure metrics. Surprisingly, single-project JIT and release-level predictions had median AUC-ROC values greater than 95% and 90% respectively, contrary to studies that assert poor performance due to insufficient training data. We also found that a model trained on release-level data from one of the twelve projects could predict defect-proneness of features in the other eleven projects with median AUC-ROC of 82%, without retraining.

SEFeb 28, 2021
Seamless Variability Management With the Virtual Platform

Wardah Mahmood, Daniel Strüber, Thorsten Berger et al.

Customization is a general trend in software engineering, demanding systems that support variable stakeholder requirements. Two opposing strategies are commonly used to create variants: software clone & own and software configuration with an integrated platform. Organizations often start with the former, which is cheap, agile, and supports quick innovation, but does not scale. The latter scales by establishing an integrated platform that shares software assets between variants, but requires high up-front investments or risky migration processes. So, could we have a method that allows an easy transition or even combine the benefits of both strategies? We propose a method and tool that supports a truly incremental development of variant-rich systems, exploiting a spectrum between both opposing strategies. We design, formalize, and prototype the variability-management framework virtual platform. It bridges clone & own and platform-oriented development. Relying on programming-language-independent conceptual structures representing software assets, it offers operators for engineering and evolving a system, comprising: traditional, asset-oriented operators and novel, feature-oriented operators for incrementally adopting concepts of an integrated platform. The operators record meta-data that is exploited by other operators to support the transition. Among others, they eliminate expensive feature-location effort or the need to trace clones. Our evaluation simulates the evolution of a real-world, clone-based system, measuring its costs and benefits.

SEFeb 13, 2021
Asset Management in Machine Learning: A Survey

Samuel Idowu, Daniel Strüber, Thorsten Berger

Machine Learning (ML) techniques are becoming essential components of many software systems today, causing an increasing need to adapt traditional software engineering practices and tools to the development of ML-based software systems. This need is especially pronounced due to the challenges associated with the large-scale development and deployment of ML systems. Among the most commonly reported challenges during the development, production, and operation of ML-based systems are experiment management, dependency management, monitoring, and logging of ML assets. In recent years, we have seen several efforts to address these challenges as witnessed by an increasing number of tools for tracking and managing ML experiments and their assets. To facilitate research and practice on engineering intelligent systems, it is essential to understand the nature of the current tool support for managing ML assets. What kind of support is provided? What asset types are tracked? What operations are offered to users for managing those assets? We discuss and position ML asset management as an important discipline that provides methods and tools for ML assets as structures and the ML development activities as their operations. We present a feature-based survey of 17 tools with ML asset management support identified in a systematic search. We overview these tools' features for managing the different types of assets used for engineering ML-based systems and performing experiments. We found that most of the asset management support depends on traditional version control systems, while only a few tools support an asset granularity level that differentiates between important ML assets, such as datasets and models.

SEDec 30, 2020
ConfigFix: Interactive Configuration Conflict Resolution for the Linux Kernel

Patrick Franz, Thorsten Berger, Ibrahim Fayaz et al.

Highly configurable systems are highly complex systems, with the Linux kernel arguably being one of the most well-known ones. Since 2007, it has been a frequent target of the research community, conducting empirical studies and building dedicated methods and tools for analyzing, configuring, testing, optimizing, and maintaining the kernel in the light of its vast configuration space. However, despite a large body of work, mainly bug fixes that were the result of such research made it back into the kernel's source tree. Unfortunately, Linux users still struggle with kernel configuration and resolving configuration conflicts, since the kernel largely lacks automated support. Additionally, there are technical and community requirements for supporting automated conflict resolution in the kernel, such as, for example, using a pure C-based solution that uses only compatible third-party libraries (if any). With the aim of contributing back to the Linux community, we present CONFIGFIX, a tooling that we integrated with the kernel configurator, that is purely implemented in C, and that is finally a working solution able to produce fixes for configuration conflicts. In this experience report, we describe our experiences ranging over a decade of building upon the large body of work from research on the Linux kernel configuration mechanisms as well as how we designed and realized CONFIGFIX while adhering to the Linux kernel's community requirements and standards. While CONFIGFIX helps Linux kernel users obtaining their desired configuration, the sound semantic abstraction we implement provides the basis for many of the above techniques supporting kernel configuration, helping researchers and kernel developers.

SEJun 18, 2020
Robotics Software Engineering: A Perspective from the Service Robotics Domain

Sergio García, Daniel Strüber, Davide Brugali et al.

Robots that support humans by performing useful tasks (a.k.a., service robots) are booming worldwide. In contrast to industrial robots, the development of service robots comes with severe software engineering challenges, since they require high levels of robustness and autonomy to operate in highly heterogeneous environments. As a domain with critical safety implications, service robotics faces a need for sound software development practices. In this paper, we present the first large-scale empirical study to assess the state of the art and practice of robotics software engineering. We conducted 18 semi-structured interviews with industrial practitioners working in 15 companies from 9 different countries and a survey with 156 respondents (from 26 countries) from the robotics domain. Our results provide a comprehensive picture of (i) the practices applied by robotics industrial and academic practitioners, including processes, paradigms, languages, tools, frameworks, and reuse practices, (ii) the distinguishing characteristics of robotics software engineering, and (iii) recurrent challenges usually faced, together with adopted solutions. The paper concludes by discussing observations, derived hypotheses, and proposed actions for researchers and practitioners.

SEJan 7, 2019
Specification Patterns for Robotic Missions

Claudio Menghi, Christos Tsigkanos, Patrizio Pelliccione et al.

Mobile and general-purpose robots increasingly support our everyday life, requiring dependable robotics control software. Creating such software mainly amounts to implementing their complex behaviors known as missions. Recognizing the need, a large number of domain-specific specification languages has been proposed. These, in addition to traditional logical languages, allow the use of formally specified missions for synthesis, verification, simulation, or guiding the implementation. For instance, the logical language LTL is commonly used by experts to specify missions, as an input for planners, which synthesize the behavior a robot should have. Unfortunately, domain-specific languages are usually tied to specific robot models, while logical languages such as LTL are difficult to use by non-experts. We present a catalog of 22 mission specification patterns for mobile robots, together with tooling for instantiating, composing, and compiling the patterns to create mission specifications. The patterns provide solutions for recurrent specification problems, each of which detailing the usage intent, known uses, relationships to other patterns, and---most importantly---a template mission specification in temporal logic. Our tooling produces specifications expressed in the LTL and CTL temporal logics to be used by planners, simulators, or model checkers. The patterns originate from 245 realistic textual mission requirements extracted from the robotics literature, and they are evaluated upon a total of 441 real-world mission requirements and 1251 mission specifications. Five of these reflect scenarios we defined with two well-known industrial partners developing human-size robots. We validated our patterns' correctness with simulators and two real robots.