Holger Giese

SE
h-index7
19papers
489citations
Novelty41%
AI Score46

19 Papers

SEApr 20
From Program Slices to Causal Clarity: Evaluating Faithful, Actionable LLM-Generated Failure Explanations via Context Partitioning and LLM-as-a-Judge

Julius Porbeck, Christian Medeiros Adriano, Holger Giese

Large language model (LLM)-based debugging systems can generate failure explanations, but these explanations may be incomplete or incorrect. Misleading explanations are harmful for downstream tasks (e.g., bug triage, bug fixing). We investigate how explanation quality is affected by various LLM context configurations. Existing work predominantly treats LLM-generated failure explanations as an ad hoc by-product of debugging or repair workflows, using generic prompting over undifferentiated artifacts such as code, tests, and error messages rather than targeting explanations as a first-class output with dedicated quality assessment. Consequently, existing approaches provide limited support for assessing whether these explanations capture the underlying fault-error-failure mechanism and for actionable next steps, and most techniques instead prioritize task success (e.g., patch correctness or review quality) over the explicit causal explanation quality. We systematically vary the debugging information to study how distinct context compositions affect the quality of LLM-generated failure explanations. Across 93 context configurations on real bugs and three economically viable models (gpt-5-mini, DeepSeek-V3.2, and Grok-4.1-fast), we evaluate explanations with six criteria and validate the LLM-as-a-judge scores against human ratings in a user study. Our results indicate that explanation quality is causally affected by context composition. Evidence-rich, failure-specific artifacts improve causal and action-oriented quality, whereas overly large contexts tend to yield vague explanations. Higher explanation-score quartiles are associated with higher downstream repair pass rates and, for some models, with fixes that are closer to the reference minimal fixes. In contrast, low-score quartiles can even underperform the no-explanation baseline. Reproduction package is publicly available.

LOApr 30
Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

Zainab Rehan, Christian Medeiros Adriano, Sona Ghahremani et al.

Rule-based systems remain central in safety-critical domains but often struggle with scalability, brittleness, and goal misspecification. These limitations can lead to reward hacking and failures in formal verification, as AI systems tend to optimize for narrow objectives. In previous research, we developed a neuro-symbolic causal framework that integrates first-order logic abduction trees, structural causal models, and deep reinforcement learning within a MAPE-K loop to provide explainable adaptations under distribution shifts. In this paper, we extend that framework by introducing a meta-level layer designed to mitigate goal misspecification and support scalable rule maintenance. This layer consists of a Goal/Rule Synthesizer and a Rule Verification Engine, which iteratively refine a formal rule theory from high-level natural-language goals and principles provided by human experts. The synthesis pipeline employs large language models (LLMs) to: (1) decompose goals into candidate causes, (2) consolidate semantics to remove redundancies, (3) translate them into candidate first-order rules, and (4) compose necessary and sufficient causal sets. The verification pipeline then performs (1) syntax and schema validation, (2) logical consistency analysis, and (3) safety and invariant checks before integrating verified rules into the knowledge base. We evaluated our approach with a proof-of-concept implementation in two autonomous driving scenarios. Results indicate that, given human-specified goals and principles, the pipeline can successfully derive minimal necessary and sufficient rule sets and formalize them as logical constraints. These findings suggest that the pipeline supports incremental, modular, and traceable rule synthesis grounded in established legal and safety principles.

AIJul 18, 2025
Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments

Kathrin Korte, Christian Medeiros Adriano, Sona Ghahremani et al.

[Context] Multi-agent reinforcement learning (MARL) has achieved notable success in environments where agents must learn coordinated behaviors. However, transferring knowledge across agents remains challenging in non-stationary environments with changing goals. [Problem] Traditional knowledge transfer methods in MARL struggle to generalize, and agents often require costly retraining to adapt. [Approach] This paper introduces a causal knowledge transfer framework that enables RL agents to learn and share compact causal representations of paths within a non-stationary environment. As the environment changes (new obstacles), agents' collisions require adaptive recovery strategies. We model each collision as a causal intervention instantiated as a sequence of recovery actions (a macro) whose effect corresponds to a causal knowledge of how to circumvent the obstacle while increasing the chances of achieving the agent's goal (maximizing cumulative reward). This recovery action macro is transferred online from a second agent and is applied in a zero-shot fashion, i.e., without retraining, just by querying a lookup model with local context information (collisions). [Results] Our findings reveal two key insights: (1) agents with heterogeneous goals were able to bridge about half of the gap between random exploration and a fully retrained policy when adapting to new environments, and (2) the impact of causal knowledge transfer depends on the interplay between environment complexity and agents' heterogeneous goals.

CVMay 6, 2024
Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data

Leonhard Hennicke, Christian Medeiros Adriano, Holger Giese et al.

Generative foundation models like Stable Diffusion comprise a diverse spectrum of knowledge in computer vision with the potential for transfer learning, e.g., via generating data to train student models for downstream tasks. This could circumvent the necessity of collecting labeled real-world data, thereby presenting a form of data-free knowledge distillation. However, the resultant student models show a significant drop in accuracy compared to models trained on real data. We investigate possible causes for this drop and focus on the role of the different layers of the student model. By training these layers using either real or synthetic data, we reveal that the drop mainly stems from the model's final layers. Further, we briefly investigate other factors, such as differences in data-normalization between synthetic and real, the impact of data augmentations, texture vs.\ shape learning, and assuming oracle prompts. While we find that some of those factors can have an impact, they are not sufficient to close the gap towards real data. Building upon our insights that mainly later layers are responsible for the drop, we investigate the data-efficiency of fine-tuning a synthetically trained model with real data applied to only those last layers. Our results suggest an improved trade-off between the amount of real training data used and the model's accuracy. Our findings contribute to the understanding of the gap between synthetic and real data and indicate solutions to mitigate the scarcity of labeled real data.

SEAug 25, 2021
Hybrid Planning with Receding Horizon: A Case for Meta-self-awareness

Sona Ghahremani, Holger Giese

The trade-off between the quality and timeliness of adaptation is a multi-faceted challenge in engineering self-adaptive systems. Obtaining adaptation plans that fulfill system objectives with high utility and in a timely manner is the holy grail, however, as recent research revealed, it is not trivial. Hybrid planning is concerned with resolving the time and quality trade-off via dynamically combining multiple planners that individually aim to perform either timely or with high quality. The choice of the most fitting planner is steered based on assessments of runtime information. A hybrid planner for a self-adaptive system requires (i) a decision-making mechanism that utilizes (ii) system-level as well as (iii) feedback control-level information at runtime. In this paper, we present HYPEZON, a hybrid planner for self-adaptive systems. Inspired by model predictive control, HYPEZON leverages receding horizon control to utilize runtime information during its decision-making. Moreover, we propose to engineer HYPEZON for self-adaptive systems via two alternative designs that conform to meta-self-aware architectures. Meta-self-awareness allows for obtaining knowledge and reasoning about own awareness via adding a higher-level reasoning entity. HYPEZON aims to address the problem of hybrid planning by considering it as a case for meta-self-awareness.

SEJun 15, 2021
Probabilistic Metric Temporal Graph Logic

Sven Schneider, Maria Maximova, Holger Giese

Cyber-physical systems often encompass complex concurrent behavior with timing constraints and probabilistic failures on demand. The analysis whether such systems with probabilistic timed behavior ad-here to a given specification is essential. When the states of the system can be represented by graphs, the rule-based formalism of Probabilistic Timed Graph Transformation Systems (PTGTSs) can be used to suitably capture structure dynamics as well as probabilistic and timed behavior of the system. The model checking support for PTGTSs w.r.t. properties specified using Probabilistic Timed Computation Tree Logic (PTCTL) has been already presented. Moreover, for timed graph-based runtime monitoring, Metric Temporal Graph Logic (MTGL) has been developed for stating metric temporal properties on identified subgraphs and their structural changes over time. In this paper, we (a) extend MTGL to the Probabilistic Metric Temporal Graph Logic (PMTGL) by allowing for the specification of probabilistic properties, (b) adapt our MTGL satisfaction checking approach to PTGTSs, and (c) combine the approaches for PTCTL model checking and MTGL satisfaction checking to obtain a Bounded Model Checking (BMC) approach for PMTGL. In our evaluation, we apply an implementation of our BMC approach in AutoGraph to a running example.

SEAug 10, 2020
A Scalable Querying Scheme for Memory-efficient Runtime Models with History

Lucas Sakizloglou, Sona Ghahremani, Matthias Barkowsky et al.

Runtime models provide a snapshot of a system at runtime at a desired level of abstraction. Via a causal connection to the modeled system and by employing model-driven engineering techniques, runtime models support schemes for (runtime) adaptation where data from previous snapshots facilitates more informed decisions. Nevertheless, although runtime models and model-based adaptation techniques have been the focus of extensive research, schemes that treat the evolution of the model over time as a first-class citizen have only lately received attention. Consequently, there is a lack of sophisticated technology for such runtime models with history. We present a querying scheme where the integration of temporal requirements with incremental model queries enables scalable querying for runtime models with history. Moreover, our scheme provides for a memory-efficient storage of such models. By integrating these two features into an adaptation loop, we enable efficient history-aware self-adaptation via runtime models, of which we present an implementation.

SEMay 20, 2020
Improving Scalability and Reward of Utility-Driven Self-Healing for Large Dynamic Architectures

Sona Ghahremani, Holger Giese, Thomas Vogel

Self-adaptation can be realized in various ways. Rule-based approaches prescribe the adaptation to be executed if the system or environment satisfies certain conditions. They result in scalable solutions but often with merely satisfying adaptation decisions. In contrast, utility-driven approaches determine optimal decisions by using an often costly optimization, which typically does not scale for large problems. We propose a rule-based and utility-driven adaptation scheme that achieves the benefits of both directions such that the adaptation decisions are optimal, whereas the computation scales by avoiding an expensive optimization. We use this adaptation scheme for architecture-based self-healing of large software systems. For this purpose, we define the utility for large dynamic architectures of such systems based on patterns that define issues the self-healing must address. Moreover, we use pattern-based adaptation rules to resolve these issues. Using a pattern-based scheme to define the utility and adaptation rules allows us to compute the impact of each rule application on the overall utility and to realize an incremental and efficient utility-driven self-healing. In addition to formally analyzing the computational effort and optimality of the proposed scheme, we thoroughly demonstrate its scalability and optimality in terms of reward in comparative experiments with a static rule-based approach as a baseline and a utility-driven approach using a constraint solver. These experiments are based on different failure profiles derived from real-world failure logs. We also investigate the impact of different failure profile characteristics on the scalability and reward to evaluate the robustness of the different approaches.

SEMay 15, 2020
Collective Risk Minimization via a Bayesian Model for Statistical Software Testing

Joachim Haensel, Christian M. Adriano, Johannes Dyck et al.

In the last four years, the number of distinct autonomous vehicles platforms deployed in the streets of California increased 6-fold, while the reported accidents increased 12-fold. This can become a trend with no signs of subsiding as it is fueled by a constant stream of innovations in hardware sensors and machine learning software. Meanwhile, if we expect the public and regulators to trust the autonomous vehicle platforms, we need to find better ways to solve the problem of adding technological complexity without increasing the risk of accidents. We studied this problem from the perspective of reliability engineering in which a given risk of an accident has severity and probability of occurring. Timely information on accidents is important for engineers to anticipate and reuse previous failures to approximate the risk of accidents in a new city. However, this is challenging in the context of autonomous vehicles because of the sparse nature of data on the operational scenarios (driving trajectories in a new city). Our approach was to mitigate data sparsity by reducing the state space through monitoring of multiple-vehicles operations. We then minimized the risk of accidents by determining proper allocation of tests for each equivalence class. Our contributions comprise (1) a set of strategies to monitor the operational data of multiple autonomous vehicles, (2) a Bayesian model that estimates changes in the risk of accidents, and (3) a feedback control-loop that minimizes these risks by reallocating test effort. Our results are promising in the sense that we were able to measure and control risk for a diversity of changes in the operational scenarios. We evaluated our models with data from two real cities with distinct traffic patterns and made the data available for the community.

SEApr 7, 2020
Towards Highly Scalable Runtime Models with History

Lucas Sakizloglou, Sona Ghahremani, Thomas Brand et al.

Advanced systems such as IoT comprise many heterogeneous, interconnected, and autonomous entities operating in often highly dynamic environments. Due to their large scale and complexity, large volumes of monitoring data are generated and need to be stored, retrieved, and mined in a time- and resource-efficient manner. Architectural self-adaptation automates the control, orchestration, and operation of such systems. This can only be achieved via sophisticated decision-making schemes supported by monitoring data that fully captures the system behavior and its history. Employing model-driven engineering techniques we propose a highly scalable, history-aware approach to store and retrieve monitoring data in form of enriched runtime models. We take advantage of rule-based adaptation where change events in the system trigger adaptation rules. We first present a scheme to incrementally check model queries in the form of temporal logic formulas which represent the conditions of adaptation rules against a runtime model with history. Then we enhance the model to retain only information that is temporally relevant to the queries, therefore reducing the accumulation of information to a required minimum. Finally, we demonstrate the feasibility and scalability of our approach via experiments on a simulated smart healthcare system employing a real-world medical guideline.

SEMay 17, 2018
A Testing Scheme for Self-Adaptive Software Systems with Architectural Runtime Models

Joachim Hänsel, Thomas Vogel, Holger Giese

Self-adaptive software systems (SASS) are equipped with feedback loops to adapt autonomously to changes of the software or environment. In established fields, such as embedded software, sophisticated approaches have been developed to systematically study feedback loops early during the development. In order to cover the particularities of feedback, techniques like one-way and in-the-loop simulation and testing have been included. However, a related approach to systematically test SASS is currently lacking. In this paper we therefore propose a systematic testing scheme for SASS that allows engineers to test the feedback loops early in the development by exploiting architectural runtime models. These models that are available early in the development are commonly used by the activities of a feedback loop at runtime and they provide a suitable high-level abstraction to describe test inputs as well as expected test results. We further outline our ideas with some initial evaluation results by means of a small case study.

SEMay 17, 2018
Model-Driven Engineering of Self-Adaptive Software with EUREMA

Thomas Vogel, Holger Giese

The development of self-adaptive software requires the engineering of an adaptation engine that controls the underlying adaptable software by feedback loops. The engine often describes the adaptation by runtime models representing the adaptable software and by activities such as analysis and planning that use these models. To systematically address the interplay between runtime models and adaptation activities, runtime megamodels have been proposed. A runtime megamodel is a specific model capturing runtime models and adaptation activities. In this article, we go one step further and present an executable modeling language for ExecUtable RuntimE MegAmodels (EUREMA) that eases the development of adaptation engines by following a model-driven engineering approach. We provide a domain-specific modeling language and a runtime interpreter for adaptation engines, in particular feedback loops. Megamodels are kept alive at runtime and by interpreting them, they are directly executed to run feedback loops. Additionally, they can be dynamically adjusted to adapt feedback loops. Thus, EUREMA supports development by making feedback loops explicit at a higher level of abstraction and it enables solutions where multiple feedback loops interact or operate on top of each other and self-adaptation co-exists with offline adaptation for evolution.

SEMay 17, 2018
A language for feedback loops in self-adaptive systems: Executable runtime megamodels

Thomas Vogel, Holger Giese

The development of self-adaptive software requires the engineering of proper feedback loops where an adaptation logic controls the underlying software. The adaptation logic often describes the adaptation by using runtime models representing the underlying software and steps such as analysis and planning that operate on these runtime models. To systematically address this interplay, runtime megamodels, which are specific runtime models that have themselves runtime models as their elements and that also capture the relationships between multiple runtime models, have been proposed. In this paper, we go one step further and present a modeling language for runtime megamodels that considerably eases the development of the adaptation logic by providing a domain-specific modeling approach and a runtime interpreter for this part of a self-adaptive system. This supports development by modeling the feedback loops explicitly and at a higher level of abstraction. Moreover, it permits to build complex solutions where multiple feedback loops interact or operate on top of each other, which is leveraged by keeping the megamodels explicit and alive at runtime and by interpreting them.

SEMay 17, 2018
The Role of Models and Megamodels at Runtime

Thomas Vogel, Andreas Seibel, Holger Giese

In model-driven software development a multitude of interrelated models are used to systematically realize a software system. This results in a complex development process since the models and the relations between the models have to be managed. Similar problems appear when following a model-driven approach for managing software systems at runtime. A multitude of interrelated runtime models are employed simultaneously, and thus they have to be maintained at runtime. While for the development case megamodels have emerged to address the problem of managing models and relations, the problem is rather neglected for the case of runtime models by applying ad-hoc solutions. Therefore, we propose to utilize megamodel concepts for the case of multiple runtime models. Based on the current state of research, we present a categorization of runtime models and conceivable relations between them. The categorization describes the role of interrelated models at runtime and demonstrates that several approaches already employ multiple runtime models and relations. Then, we show how megamodel concepts help in organizing and utilizing runtime models and relations in a model-driven manner while supporting a high level of automation. Finally, the role of interrelated models and megamodels at runtime is discussed for self-adaptive software systems and exemplified by a case study.

SEMay 17, 2018
Requirements and Assessment of Languages and Frameworks for Adaptation Models

Thomas Vogel, Holger Giese

Approaches to self-adaptive software systems use models at runtime to leverage benefits of model-driven engineering (MDE) for providing views on running systems and for engineering feedback loops. Most of these approaches focus on causally connecting runtime models and running systems, and just apply typical MDE techniques, like model transformation, or well-known techniques, like event-condition-action rules, from other fields than MDE to realize a feedback loop. However, elaborating requirements for feedback loop activities for the specific case of runtime models is rather neglected. Therefore, we investigate requirements for Adaptation Models that specify the analysis, decision-making, and planning of adaptation as part of a feedback loop. In particular, we consider requirements for a modeling language of adaptation models and for a framework as the execution environment of adaptation models. Moreover, we discuss patterns for using adaptation models within the feedback loop regarding the structuring of loop activities and the implications on the requirements for adaptation models. Finally, we assess two existing approaches to adaptation models concerning their fitness for the requirements discussed in this paper.

SEMay 17, 2018
Adaptation and Abstract Runtime Models

Thomas Vogel, Holger Giese

Runtime adaptability is often a crucial requirement for today's complex software systems. Several approaches use an architectural model as a runtime representation of a managed system for monitoring, reasoning and performing adaptation. To ease the causal connection between a system and a model, these models are often closely related to the implementation and at a rather low level of abstraction. This makes them as complex as the implementation and it impedes reusability and extensibility of autonomic managers. Moreover, the models often do not cover different concerns, like security or performance, and therefore they do not support several self-management capabilities at once. In this paper we propose a model-driven approach that provides multiple architectural runtime models at different levels of abstraction as a basis for adaptation. Each runtime model abstracts from the underlying system and platform leveraging reusability and extensibility of managers that work on these models. Moreover, each model focuses on a specific concern which simplifies the work of autonomic managers. The different models are maintained automatically at runtime using model-driven engineering techniques that also reduce development efforts. Our approach has been implemented for the broadly adopted Enterprise Java Beans component standard and its application is presented in a self-healing scenario requiring structural adaptation.

SEMay 17, 2018
Model-Driven Architectural Monitoring and Adaptation for Autonomic Systems

Thomas Vogel, Stefan Neumann, Stephan Hildebrandt et al.

Architectural monitoring and adaptation allows self-management capabilities of autonomic systems to realize more powerful adaptation steps, which observe and adjust not only parameters but also the software architecture. However, monitoring as well as adaptation of the architecture of a running system in addition to the parameters are considerably more complex and only rather limited and costly solutions are available today. In this paper we propose a model-driven approach to ease the development of architectural monitoring and adaptation for autonomic systems. Using meta models and model transformation techniques, we were able to realize an incremental synchronization between the run-time system and models for different self-management activities. The synchronization might be triggered when needed and therefore the activities can operate concurrently.

SEMay 9, 2018
Efficient Utility-Driven Self-Healing Employing Adaptation Rules for Large Dynamic Architectures

Sona Ghahremani, Holger Giese, Thomas Vogel

Self-adaptation can be realized in various ways. Rule-based approaches prescribe the adaptation to be executed if the system or environment satisfy certain conditions and result in scalable solutions, however, with often only satisfying adaptation decisions. In contrast, utility-driven approaches determine optimal adaptation decisions by using an often costly optimization step, which typically does not scale well for larger problems. We propose a rule-based and utility-driven approach that achieves the beneficial properties of each of these directions such that the adaptation decisions are optimal while the computation remains scalable since an expensive optimization step can be avoided. The approach can be used for the architecture-based self-healing of large software systems. We define the utility for large dynamic architectures of such systems based on patterns capturing issues the self-healing must address and we use patternbased adaptation rules to resolve the issues. Defining the utility as well as the adaptation rules pattern-based allows us to compute the impact of each rule application on the overall utility and to realize an incremental and efficient utility-driven self-healing. We demonstrate the efficiency and optimality of our scheme in comparative experiments with a static rule-based scheme as a baseline and a utility-driven approach using a constraint solver.