SEAug 12, 2024Code
A Large-Scale Study of Model Integration in ML-Enabled Software SystemsYorick Sens, Henriette Knopp, Sven Peldszus et al.
The rise of machine learning (ML) and its integration into software systems has drastically changed development practices. While software engineering traditionally focused on manually created code artifacts with dedicated processes and architectures, ML-enabled systems require additional data-science methods and tools to create ML artifacts -- especially ML models and training data. However, integrating models into systems, and managing the many different artifacts involved, is far from trivial. ML-enabled systems can easily have multiple ML models that interact with each other and with traditional code in intricate ways. Unfortunately, while challenges and practices of building ML-enabled systems have been studied, little is known about the characteristics of real-world ML-enabled systems beyond isolated examples. Improving engineering processes and architectures for ML-enabled systems requires improving the empirical understanding of these systems. We present a large-scale study of 2,928 open-source ML-enabled software systems. We classified and analyzed them to determine system characteristics, model and code reuse practices, and architectural aspects of integrating ML models. Our findings show that these systems still mainly consist of traditional source code, and that ML model reuse through code duplication or pre-trained models is common. We also identified different ML integration patterns and related implementation practices. We hope that our results help improve practices for integrating ML models, bringing data science and software engineering closer together.
CRJan 22
120 Domain-Specific Languages for SecurityMarkus Krausz, Sven Peldszus, Francesco Regazzoni et al.
Security engineering, from security requirements engineering to the implementation of cryptographic protocols, is often supported by domain-specific languages (DSLs). Unfortunately, a lack of knowledge about these DSLs, such as which security aspects are addressed and when, hinders their effective use and further research. This systematic literature review examines 120 security-oriented DSLs based on six research questions concerning security aspects and goals, language-specific characteristics, integration into the software development lifecycle (SDLC), and effectiveness of the DSLs. We observe a high degree of fragmentation, which leads to opportunities for integration. We also need to improve the usability and evaluation of security DSLs.
SEAug 19, 2021Code
Checking Security Compliance between Models and CodeKatja Tuma, Sven Peldszus, Daniel Strüber et al.
It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic technique for secure data flow compliance checks between design models and code. We develop heuristic-based automated mappings between a design-level model (SecDFD, provided by humans) and a code-level representation (Program Model, automatically extracted from the implementation) in order to guide users in discovering compliance violations, and hence potential security flaws in the code. These mappings enable an automated, and project-specific static analysis of the implementation with respect to the desired security properties of the design model. We developed two types of security compliance checks and evaluated the entire approach on open source Java projects.
CRMay 8
Can I Check What I Designed? Mapping Security Design DSLs to Code AnalyzersSven Peldszus, Frederik Reiche, Kevin Hermann et al.
When assessing the potential impact of code-level vulnerabilities, e.g., discovered by automated analyzers, it is essential to consider them in the context of the system's security design. However, this is a challenging task due to the abstraction gap between security design, often specified using security DSLs, and implementation. As we will show, even security experts lack a complete understanding of this relationship. Intrigued by this gap (and the general disconnect between secure design and secure implementation) we present a study of 66 design-level security DSLs and 559 security checks from 36 code-level analyzers. We identify what concepts are common to both and capture them in the SecLan model, which has been validated by 22 security experts. Based on this, we investigate the relationship between DSLs and analyzers quantitatively and explore it qualitatively together with 9 security experts. We learn that there are few commonalities between design-level and implementation-level security; security checks are often described by overly general weaknesses, resulting in many non-obvious potential relationships between security DSLs and analyzers; and even security experts are overwhelmed by this complexity. We provide an empirical basis that helps practitioners and researchers better understand the gap and serves as a first step toward bridging it.