Frank Elberzhager

SE
h-index14
10papers
74citations
Novelty29%
AI Score32

10 Papers

SEOct 27, 2025Code
Evaluating the effectiveness of LLM-based interoperability

Rodrigo Falcão, Stefan Schweitzer, Julien Siebert et al.

Background: Systems of systems are becoming increasingly dynamic and heterogeneous, and this adds pressure on the long-standing challenge of interoperability. Besides its technical aspect, interoperability has also an economic side, as development time efforts are required to build the interoperability artifacts. Objectives: With the recent advances in the field of large language models (LLMs), we aim at analyzing the effectiveness of LLM-based strategies to make systems interoperate autonomously, at runtime, without human intervention. Method: We selected 13 open source LLMs and curated four versions of a dataset in the agricultural interoperability use case. We performed three runs of each model with each version of the dataset, using two different strategies. Then we compared the effectiveness of the models and the consistency of their results across multiple runs. Results: qwen2.5-coder:32b was the most effective model using both strategies DIRECT (average pass@1 >= 0.99) and CODEGEN (average pass@1 >= 0.89) in three out of four dataset versions. In the fourth dataset version, which included an unit conversion, all models using the strategy DIRECT failed, whereas using CODEGEN qwen2.5-coder:32b succeeded with an average pass@1 = 0.75. Conclusion: Some LLMs can make systems interoperate autonomously. Further evaluation in different domains is recommended, and further research on reliability strategies should be conducted.

SEAug 8, 2021
Tackling Consistency-related Design Challenges of Distributed Data-Intensive Systems - An Action Research Study

Susanne Braun, Stefan Deßloch, Eberhard Wolff et al.

Background: Distributed data-intensive systems are increasingly designed to be only eventually consistent. Persistent data is no longer processed with serialized and transactional access, exposing applications to a range of potential concurrency anomalies that need to be handled by the application itself. Controlling concurrent data access in monolithic systems is already challenging, but the problem is exacerbated in distributed systems. To make it worse, only little systematic engineering guidance is provided by the software architecture community regarding this issue. Aims: In this paper, we report on our study of the effectiveness and applicability of the novel design guidelines we are proposing in this regard. Method: We used action research and conducted it in the context of the software architecture design process of a multi-site platform development project. Results: Our hypotheses regarding effectiveness and applicability have been accepted in the context of the study. The initial design guidelines were refined throughout the study. Thus, we also contribute concrete guidelines for architecting distributed data-intensive systems with eventually consistent data. The guidelines are an advancement of Domain-Driven Design and provide additional patterns for the tactical design part. Conclusions: Based on our results, we recommend using the guidelines to architect safe eventually consistent systems. Because of the relevance of distributed data-intensive systems, we will drive this research forward and evaluate it in further domains.

SEJan 30, 2014
Focusing Testing by Using Inspection and Product Metrics

Frank Elberzhager, Stephan Kremer, Jürgen Münch et al.

A well-known approach for identifying defect-prone parts of software in order to focus testing is to use different kinds of product metrics such as size or complexity. Although this approach has been evaluated in many contexts, the question remains if there are further opportunities to improve test focusing. One idea is to identify other types of information that may indicate the location of defect-prone software parts. Data from software inspections, in particular, appear to be promising. This kind of data might already lead to software parts that have inherent difficulties or programming challenges, and in consequence might be defect-prone. This article first explains how inspection and product metrics can be used to focus testing activities. Second, we compare selected product and inspection metrics commonly used to predict defect-prone parts (e.g., size and complexity metrics, inspection defect content metrics, and defect density metrics). Based on initial experience from two case studies performed in different environments, the suitability of different metrics for predicting defect-prone parts is illustrated. The studies revealed that inspection defect data seems to be a suitable predictor, and a combination of certain inspection and product metrics led to the best prioritizations in our contexts. In addition, qualitative experience is presented, which substantiates the expected benefit of using inspection results to optimize testing.

SEJan 13, 2014
Predicting Defect Content and Quality Assurance Effectiveness by Combining Expert Judgment and Defect Data - A Case Study

Michael Kläs, Haruka Nakao, Frank Elberzhager et al.

Planning quality assurance (QA) activities in a systematic way and controlling their execution are challenging tasks for companies that develop software or software-intensive systems. Both require estimation capabilities regarding the effectiveness of the applied QA techniques and the defect content of the checked artifacts. Existing approaches for these purposes need extensive measurement data from historical projects. Due to the fact that many companies do not collect enough data for applying these approaches (especially for the early project lifecycle), they typically base their QA planning and controlling solely on expert opinion. This article presents a hybrid method that combines commonly available measurement data and context-specific expert knowledge. To evaluate the method's applicability and usefulness, we conducted a case study in the context of independent verification and validation activities for critical software in the space domain. A hybrid defect content and effectiveness model was developed for the software requirements analysis phase and evaluated with available legacy data. One major result is that the hybrid model provides improved estimation accuracy when compared to applicable models based solely on data. The mean magnitude of relative error (MMRE) determined by cross-validation is 29.6% compared to 76.5% obtained by the most accurate data-based model.

SEJan 7, 2014
Transparent Combination of Expert and Measurement Data for Defect Prediction: An Industrial Case Study

Michael Kläs, Frank Elberzhager, Jürgen Münch et al.

Defining strategies on how to perform quality assurance (QA) and how to control such activities is a challenging task for organizations developing or maintaining software and software-intensive systems. Planning and adjusting QA activities could benefit from accurate estimations of the expected defect content of relevant artifacts and the effectiveness of important quality assurance activities. Combining expert opinion with commonly available measurement data in a hybrid way promises to overcome the weaknesses of purely data-driven or purely expert-based estimation methods. This article presents a case study of the hybrid estimation method HyDEEP for estimating defect content and QA effectiveness in the telecommunication domain. The specific focus of this case study is the use of the method for gaining quantitative predictions. This aspect has not been empirically analyzed in previous work. Among other things, the results show that for defect content estimation, the method performs significantly better statistically than purely data-based methods, with a relative error of 0.3 on average (MMRE).

SEDec 4, 2013
Using Early Quality Assurance Metrics to Focus Testing Activities

Frank Elberzhager, Jürgen Münch

Testing of software or software-based systems and services is considered as one of the most effort-consuming activities in the lifecycle. This applies especially to those domains where highly iterative development and continuous integration cannot be applied. Several approaches have been proposed to use measurement as a means to improve test effectiveness and efficiency. Most of them rely on using product data, historical data, or in-process data that is not related to quality assurance ac- tivities. Very few approaches use data from early quality assurance activities such as inspection data in order to focus testing activities and thereby reduce test effort. This article gives an overview of potential benefits of using data from early defect detection activities, potentially in addition to other data, in order to focus testing activities. In addition, the article sketches an integrated inspection and testing process and its evaluation in the context of two case studies. Taking the study limitations into account, the results show an overall reduction of testing effort by up to 34%, which mirrors an efficiency improvement of up to about 50% for testing.

SEDec 3, 2013
The Relevance of Assumptions and Context Factors for the Integration of Inspections and Testing

Frank Elberzhager, Robert Eschbach, Jürgen Münch

Integrating inspection processes with testing processes promises to deliver several benefits, including reduced effort for quality assurance or higher defect detection rates. Systematic integration of these processes requires knowledge regarding the relationships between these processes, especially regarding the relationship between inspection defects and test defects. Such knowledge is typically context-dependent and needs to be gained analytically or empirically. If such kind of knowledge is not available, assumptions need to be made for a specific context. This article describes the relevance of assumptions and context factors for integrating inspection and testing processes and provides mechanisms for deriving assumptions in a systematic manner.

SEDec 3, 2013
Guiding Testing Activities by Predicting Defect-prone Parts Using Product and Inspection Metrics

Frank Elberzhager, Stephan Kremer, Jürgen Münch et al.

Product metrics, such as size or complexity, are often used to identify defect-prone parts or to focus quality assurance activities. In contrast, quality information that is available early, such as information provided by inspections, is usually not used. Currently, only little experience is documented in the literature on whether data from early defect detection activities can support the identification of defect-prone parts later in the development process. This article compares selected product and inspection metrics commonly used to predict defect-prone parts. Based on initial experience from two case studies performed in different environments, the suitability of different metrics for predicting defect-prone parts is illustrated. These studies revealed that inspection defect data seems to be a suitable predictor, and a combination of certain inspection and product metrics led to the best prioritizations in our contexts.

SENov 25, 2013
Inspection and Test Process Integration Based on Explicit Test Prioritization Strategies

Frank Elberzhager, Alla Rosbach, Jürgen Münch et al.

Today's software quality assurance techniques are often applied in isolation. Consequently, synergies resulting from systematically integrating different quality assurance activities are often not exploited. Such combinations promise benefits, such as a reduction in quality assurance effort or higher defect detection rates. The integration of inspection and testing, for instance, can be used to guide testing activities. For example, testing activities can be focused on defect-prone parts based upon inspection results. Existing approaches for predicting defect-prone parts do not make systematic use of the results from inspections. This article gives an overview of an integrated inspection and testing approach, and presents a preliminary case study aiming at verifying a study design for evaluating the approach. First results from this preliminary case study indicate that synergies resulting from the integration of inspection and testing might exist, and show a trend that testing activities could be guided based on inspection results.

SENov 15, 2013
Integrating Inspection and Test Processes Based on Context-Specific Assumptions

Frank Elberzhager, Jürgen Münch, Dieter Rombach et al.

Inspections and testing are two of the most commonly performed software quality assurance processes today. Typically, these processes are applied in isolation, which, however, fails to exploit the benefits of systematically combining and integrating them. In consequence, tests are not focused based on early defect detection data. Expected benefits of such process integration include higher defect detection rates or reduced quality assurance effort. Moreover, when conducting testing without any prior information regarding the system's quality, it is often unclear how to focus testing. A systematic integration of inspection and testing processes requires context-specific knowledge about the relationships between inspections and testing. This knowledge is typically not available and needs to be empirically identified and validated. Often, context-specific assumptions can be seen as a starting point for generating such knowledge. Based on the In2Test approach, which uses inspection data to focus testing, we present in this article how knowledge about the relationship between inspections and testing can be gained, documented, and evolved in an analytical or empirical manner. In addition, this article gives an overview of related work and highlights future research directions.