Matthias Tichy

SE
h-index20
8papers
10citations
Novelty32%
AI Score46

8 Papers

SEApr 17Code
Supporting the Comprehension of Data Analysis Scripts

Florian Sihler, Oliver Gerstl, Lars Pfrenger et al.

A lot of research relies on data analysis scripts to process, clean, and visualize data. However, recent studies show that these scripts are often hard to comprehend and maintain, hindering reproducibility and reuse, accompanied by a lack of tool support for handling such scripts. In this work, we focus on the R programming language, addressing this problem by presenting flowR as an extension for the common data analysis IDEs Positron and VS Code. Alongside a previously presented static backward program slicer, flowR provides an overview of data analysis scripts, interactive graph visualizations, linting, and inline value annotations to support data analysts. FlowR incrementally analyzes R projects by intertwining interprocedural data- and control-flow analyses to build a comprehensive dataflow graph, incorporating R's dynamic and explorative features. Additionally, flowR offers a plugin system and interfaces, allowing the integration of further analyses, such as new linting rules or custom visualizations. Requiring an average of 576ms to calculate the full dataflow graph of real-world projects, this enables near real-time feedback. The demonstration video is available at https://youtu.be/hJzr-r-NmMg . For the full source code and extensive documentation, refer to https://github.com/flowr-analysis/flowr . To try the docker image, use `docker run --rm -it eagleoutice/flowr`.

SEApr 3
Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition

Denis Neumüller, Sebastian Boll, David Schüler et al.

Context: Since it is well-established that developers spend a substantial portion of their time understanding source code, the ability to automatically identify algorithms within source code presents a valuable opportunity. This capability can support program comprehension, facilitate maintenance, and enhance overall software quality. Objective: We empirically evaluate how combining LLMs with static code analysis can improve the automated recognition of algorithms, while also evaluating their standalone performance and dependence on identifier names. Method: We perform multiple experiments evaluating the combination of LLMs with static analysis using different filter patterns. We compare this combined approach against their standalone performance under various prompting strategies and investigate the impact of systematic identifier obfuscation on classification performance and runtime. Results: The combination of LLMs with lightweight static analysis performs surprisingly well, reducing required LLM calls by 72.39-97.50% depending on the filter pattern. This not only lowers runtime significantly but also improves F1-scores by up to 12 percentage points (pp) compared to the baseline. Regarding the different prompting strategies, in-context learning with two examples provides an effective trade-off between classification performance and runtime efficiency, achieving F1-scores of 75-77% with only a modest increase in inference time. Lastly, we find that LLMs are not solely dependent on name-information as they are still able to identify most algorithm implementations when identifiers are obfuscated. Conclusion: By combining LLMs with static analysis, we achieve substantial reductions in runtime while simultaneously improving F1-scores, underscoring the value of a hybrid approach.

CRApr 6Code
Bridging Safety and Security in Complex Systems: A Model-Based Approach with SAFT-GT Toolchain

Irdin Pekaric, Raffaela Groner, Alexander Raschke et al.

In the rapidly evolving landscape of software engineering, the demand for robust and secure systems has become increasingly critical. This is especially true for self-adaptive systems due to their complexity and the dynamic environments in which they operate. To address this issue, we designed and developed the SAFT-GT toolchain that tackles the multifaceted challenges associated with ensuring both safety and security. This paper provides a comprehensive description of the toolchain's architecture and functionalities, including the Attack-Fault Trees generation and model combination approaches. We emphasize the toolchain's ability to integrate seamlessly with existing systems, allowing for enhanced safety and security analyses without requiring extensive modifications and domain knowledge. Our proposed approach can address evolving security threats, including both known vulnerabilities and emerging attack vectors that could compromise the system. As a use case for the toolchain, we integrate it into the feedback loop of self-adaptive systems. Finally, to validate the practical applicability of the toolchain, we conducted an extensive user study involving domain experts, whose insights and feedback underscore the toolchain's relevance and usability in real-world scenarios. Our findings demonstrate the toolchain's effectiveness in real-world applications while highlighting areas for future improvements. The toolchain and associated resources are available in an open-source repository to promote reproducibility and encourage further research in this field.

SEMay 7
Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition

Denis Neumüller, Florian Sihler, Raphael Straub et al.

The automated recognition of algorithm implementations can support many software maintenance and re-engineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.

AIMay 9, 2025
Pseudo-Boolean d-DNNF Compilation for Expressive Feature Modeling Constructs

Chico Sundermann, Stefan Vill, Elias Kuiter et al.

Configurable systems typically consist of reusable assets that have dependencies between each other. To specify such dependencies, feature models are commonly used. As feature models in practice are often complex, automated reasoning is typically employed to analyze the dependencies. Here, the de facto standard is translating the feature model to conjunctive normal form (CNF) to enable employing off-the-shelf tools, such as SAT or #SAT solvers. However, modern feature-modeling dialects often contain constructs, such as cardinality constraints, that are ill-suited for conversion to CNF. This mismatch between the input of reasoning engines and the available feature-modeling dialects limits the applicability of the more expressive constructs. In this work, we shorten this gap between expressive constructs and scalable automated reasoning. Our contribution is twofold: First, we provide a pseudo-Boolean encoding for feature models, which facilitates smaller representations of commonly employed constructs compared to Boolean encoding. Second, we propose a novel method to compile pseudo-Boolean formulas to Boolean d-DNNF. With the compiled d-DNNFs, we can resort to a plethora of efficient analyses already used in feature modeling. Our empirical evaluation shows that our proposal substantially outperforms the state-of-the-art based on CNF inputs for expressive constructs. For every considered dataset representing different feature models and feature-modeling constructs, the feature models can be significantly faster translated to pseudo-Boolean than to CNF. Overall, deriving d-DNNFs from a feature model with the targeted expressive constraints can be substantially accelerated using our pseudo-Boolean approach. Furthermore, our approach is competitive on feature models with only basic constructs.

SEJan 31, 2022
Advantages and Disadvantages of (Dedicated) Model Transformation Languages A Qualitative Interview Study

Stefan Höppner, Yves Haas, Matthias Tichy et al.

Model driven development envisages the use of model transformations to evolve models. Model transformation languages, developed for this task, are touted with many benefits over general purpose programming languages. However, a large number of these claims have not yet been substantiated. They are also made without the context necessary to be able to critically assess their merit or built meaningful empirical studies around them. The objective of our work is to elicit the reasoning, influences and background knowledge that lead people to assume benefits or drawbacks of model transformation languages. We conducted a large-scale interview study involving 56 participants from research and industry. Interviewees were presented with claims about model transformation languages and were asked to provide reasons for their assessment thereof. We qualitatively analysed the responses to find factors that influence the properties of model transformation languages as well as explanations as to how exactly they do so. Our interviews show, that general purpose expressiveness of GPLs, domain specific capabilities of MTLs as well as tooling all have strong influences on how people view properties of model transformation languages. Moreover, the Choice of MTL, the Use Case for which a transformation should be developed as well as the Skills of involved stakeholders have a moderating effect on the influences, by changing the context to consider. There is a broad body of experience, that suggests positive and negative influences for properties of MTLs. Our data suggests, that much needs to be done in order to convey the viability of model transformation languages. Efforts to provide more empirical substance need to be undergone and lackluster language capabilities and tooling need to be improved upon. We suggest several approaches for this that can be based on the results of the presented study.

SESep 24, 2021
A Domain-Specific Language for Modeling and Analyzing Solution Spaces for Technology Roadmapping

Alexander Breckel, Jakob Pietron, Katharina Juhnke et al.

The introduction of major innovations in industry requires a collaboration across the whole value chain. A common way to organize such a collaboration is the use of technology roadmaps, which act as an industry-wide long-term planning tool. Technology roadmaps are used to identify industry needs, estimate the availability of technological solutions, and identify the need for innovation in the future. Roadmaps are inherently both time-dependent and based on uncertain values, i.e., properties and structural components can change over time. Furthermore, roadmaps have to reason about alternative solutions as well as their key performance indicators. Current approaches for model-based engineering do not inherently support these aspects. We present a novel model-based approach treating those aspects as first-class citizens. To address the problem of missing support for time in the context of roadmap modeling, we introduce the concepts of a common global time, time-dependent properties, and time-dependent availability. This includes requirements, properties, and the structure of the model or its components as well. Furthermore, we support the specification and analysis of key performance indicators for alternative solutions. These concepts result in a continuous range of various valid models over time instead of a single valid model at a certain point of time. We present a graphical user interface to enable the user to efficiently create and analyze those models. We further show the semantics of the resulting model by a translation into a set of global constraints as well as how we solve the resulting constraint system. We report on the evaluation of these concepts and the Iris tool with domain experts from different companies in the automotive value chain based on the industrial case of a smart sensing electrical fuse.

SEDec 4, 2018
Verlässliche Software im 21. Jahrhundert

Stefan Wagner, Matthias Tichy, Michael Felderer et al.

Software is the main innovation driver in many different areas, like cloud services, autonomous driving, connected medical devices, and high-frequency trading. All these areas have in common that they require high dependability. In this paper, we discuss challenges and research directions imposed by these new areas on guaranteeing the dependability. On the one hand challenges include characteristics of the systems themselves, e. g., open systems and ad-hoc structures. On the other hand, we see new aspects of dependability like behavioral traceability.