Anne Koziolek

h-index36

7papers

109citations

Novelty35%

AI Score45

Ranked #69,029 of 205,806 authors (top 34%)#770 in SE (top 22%)

7 Papers

SEOct 30, 2025

A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI

Domenico Amalfitano, Andreas Metzger, Marco Autili et al.

Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evolved. This paper applies design science research to build a roadmap for GenAI-augmented SE. The process consists of three cycles that incrementally integrate multiple sources of evidence, including collaborative discussions from the FSE 2025 "Software Engineering 2030" workshop, rapid literature reviews, and external feedback sessions involving peers. McLuhan's tetrads were used as a conceptual instrument to systematically capture the transforming effects of GenAI on SE processes and software products.The resulting roadmap identifies four fundamental forms of GenAI augmentation in SE and systematically characterizes their related research challenges and opportunities. These insights are then consolidated into a set of future research directions. By grounding the roadmap in a rigorous multi-cycle process and cross-validating it among independent author teams and peers, the study provides a transparent and reproducible foundation for analyzing how GenAI affects SE processes, methods and tools, and for framing future research within this rapidly evolving area. Based on these findings, the article finally makes ten predictions for SE in the year 2030.

21.6SEMar 19

Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition

Dominik Fuchß, Haoyu Liu, Sophie Corallo et al.

Identifying architecturally relevant entities in textual artifacts is crucial for Traceability Link Recovery (TLR) between Software Architecture Documentation (SAD) and source code. While Software Architecture Models (SAMs) can bridge the semantic gap between these artifacts, their manual creation is time-consuming. LLMs offer new capabilities for extracting architectural entities from SAD and source code to construct SAMs automatically or establish direct trace links. This paper extends our ICSA 2025 paper [19], which introduced Extracting Architecture (ExArch) for LLM-based architecture component name extraction. The extension contributes the novel Architecture Traceability with Entity Matching via Semantic inference (ArTEMiS) approach, an extended evaluation with additional LLMs, configurations, a revised benchmark, and a combined evaluation of both approaches. Specifically, this paper presents the following approaches: ExArch extracts component names as simple SAMs from SAD and source code to eliminate the need for manual SAM creation, while ArTEMiS identifies architectural entities in documentation and matches them with (manually or automatically generated) SAM entities. Our evaluation compares against state-of-the-art approaches SWATTR, TransArC and ArDoCode. TransArC achieves strong performance (F1: 0.87) but requires manually created SAMs; ExArch achieves comparable results (F1: 0.86) using only SAD and code. ArTEMiS is on par with the traditional heuristic-based SWATTR (F1: 0.81) and can successfully replace it when integrated with TransArC. The combination of ArTEMiS and ExArch outperforms ArDoCode, the best baseline without manual SAMs. Our results demonstrate that LLMs can effectively identify architectural entities in textual artifacts, enabling automated SAM generation and TLR, making architecture-code traceability more practical and accessible.

36.9SEMar 27

Round-trip Engineering for Tactical DDD: A Constraint-Based Vision for the Masses

Weixing Zhang, Mario Herb, Martin Armbruster et al.

Despite Domain-Driven Design's proven value in managing complex business logic, a fundamental semantic expressiveness gap persists between generic modeling languages and tactical DDD patterns, causing continuous divergence between design intent and implementation. We envision a constraint-based tactical modeling environment that transforms abstract architectural principles into explicit, tool-enforced engineering constraints. At its core is a DDD-native metamodel where tactical patterns are first-class modeling primitives, coupled with a real-time constraint verification engine that prevents architectural violations during modeling, and bidirectional synchronization mechanisms that maintain model-code consistency through round-trip engineering. This approach aims to democratize tactical DDD by embedding expert-level architectural knowledge directly into modeling constraints, enabling small teams and junior developers to build complex business systems without sacrificing long-term maintainability. By lowering the technical barriers to DDD adoption, we envision transforming tactical DDD from an elite practice requiring continuous expert oversight into an accessible engineering discipline with tool-supported verification.

SEFeb 12

Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs: A Systematic Evaluation

Weixing Zhang, Bowen Jiang, Yuhong Fu et al.

Software languages evolve over time for reasons such as feature additions. When grammars evolve, textual instances that originally conformed to them may become outdated. While model-driven engineering provides many techniques for co-evolving models with metamodel changes, these approaches are not designed for textual DSLs and may lose human-relevant information such as layout and comments. This study systematically evaluates the potential of large language models (LLMs) for co-evolving grammars and instances of textual DSLs. Using Claude Sonnet 4.5 and GPT-5.2 across ten case languages with ten runs each, we assess both correctness and preservation of human-oriented information. Results show strong performance on small-scale cases ($\geq$94% precision and recall for instances requiring fewer than 20 modified lines), but performance degraded with scale: Claude maintains 85% recall at 40 lines, while GPT fails on the largest instances. Response time increases substantially with instance size, and grammar evolution complexity and deletion granularity affect performance more than change type. These findings clarify when LLM-based co-evolution is effective and where current limitations remain.

SEJun 30, 2020

Incremental Calibration of Architectural Performance Models with Parametric Dependencies

Manar Mazkatli, David Monschein, Johannes Grohmann et al.

Architecture-based Performance Prediction (AbPP) allows evaluation of the performance of systems and to answer what-if questions without measurements for all alternatives. A difficulty when creating models is that Performance Model Parameters (PMPs, such as resource demands, loop iteration numbers and branch probabilities) depend on various influencing factors like input data, used hardware and the applied workload. To enable a broad range of what-if questions, Performance Models (PMs) need to have predictive power beyond what has been measured to calibrate the models. Thus, PMPs need to be parametrized over the influencing factors that may vary. Existing approaches allow for the estimation of parametrized PMPs by measuring the complete system. Thus, they are too costly to be applied frequently, up to after each code change. They do not keep also manual changes to the model when recalibrating. In this work, we present the Continuous Integration of Performance Models (CIPM), which incrementally extracts and calibrates the performance model, including parametric dependencies. CIPM responds to source code changes by updating the PM and adaptively instrumenting the changed parts. To allow AbPP, CIPM estimates the parametrized PMPs using the measurements (generated by performance tests or executing the system in production) and statistical analysis, e.g., regression analysis and decision trees. Additionally, our approach responds to production changes (e.g., load or deployment changes) and calibrates the usage and deployment parts of PMs accordingly. For the evaluation, we used two case studies. Evaluation results show that we were able to calibrate the PM incrementally and accurately.

SEJan 29, 2018

Rapid Testing of IaaS Resource Management Algorithms via Cloud Middleware Simulation

Christian Stier, Jörg Domaschka, Anne Koziolek et al.

Infrastructure as a Service (IaaS) Cloud services allow users to deploy distributed applications in a virtualized environment without having to customize their applications to a specific Platform as a Service (PaaS) stack. It is common practice to host multiple Virtual Machines (VMs) on the same server to save resources. Traditionally, IaaS data center management required manual effort for optimization, e.g. by consolidating VM placement based on changes in usage patterns. Many resource management algorithms and frameworks have been developed to automate this process. Resource management algorithms are typically tested via experimentation or using simulation. The main drawback of both approaches is the high effort required to conduct the testing. Existing Cloud or IaaS simulators require the algorithm engineer to reimplement their algorithm against the simulator's API. Furthermore, the engineer manually needs to define the workload model used for algorithm testing. We propose an approach for the simulative analysis of IaaS Cloud infrastructure that allows algorithm engineers and data center operators to eval- uate optimization algorithms without investing additional effort to reimplement them in a simulation environment. By leveraging runtime monitoring data, we automatically construct the simula- tion models used to test the algorithms. Our validation shows that algorithm tests conducted using our IaaS Cloud simulator match the measured behavior on actual hardware.

SEAug 18, 2015

Performance-oriented DevOps: A Research Agenda

Andreas Brunnert, Andre van Hoorn, Felix Willnecker et al.

DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functional perspective. In order to integrate a non-functional perspective into these DevOps concepts this report focuses on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance. Performance describes system properties concerning its timeliness and use of resources. Common metrics are response time, throughput, and resource utilization. Performance goals for EAs are typically defined by setting upper and/or lower bounds for these metrics and specific business transactions. In order to ensure that such performance goals can be met, several activities are required during development and operation of these systems as well as during the transition from Dev to Ops. Activities during development are typically summarized by the term Software Performance Engineering (SPE), whereas activities during operations are called Application Performance Management (APM). SPE and APM were historically tackled independently from each other, but the newly emerging DevOps concepts require and enable a tighter integration between both activity streams. This report presents existing solutions to support this integration as well as open research challenges in this area.