Wilhelm Hasselbring

SE
25papers
463citations
Novelty21%
AI Score43

25 Papers

PESep 26, 2022
Modeling Polyp Activity of Paragorgia arborea Using Supervised Learning

Arne Johanson, Sascha Flögel, Wolf-Christian Dullo et al.

While the distribution patterns of cold-water corals, such as Paragorgia arborea, have received increasing attention in recent studies, little is known about their in situ activity patterns. In this paper, we examine polyp activity in P. arborea using machine learning techniques to analyze high-resolution time series data and photographs obtained from an autonomous lander cluster deployed in the Stjernsund, Norway. An interactive illustration of the models derived in this paper is provided online as supplementary material. We find that the best predictor of the degree of extension of the coral polyps is current direction with a lag of three hours. Other variables that are not directly associated with water currents, such as temperature and salinity, offer much less information concerning polyp activity. Interestingly, the degree of polyp extension can be predicted more reliably by sampling the laminar flows in the water column above the measurement site than by sampling the more turbulent flows in the direct vicinity of the corals. Our results show that the activity patterns of the P. arborea polyps are governed by the strong tidal current regime of the Stjernsund. It appears that P. arborea does not react to shorter changes in the ambient current regime but instead adjusts its behavior in accordance with the large-scale pattern of the tidal cycle itself in order to optimize nutrient uptake.

16.4DBMar 20
AVOCADO: The Streaming Process Mining Challenge

Christian Imenkamp, Andrea Maldonado, Hendrik Reiter et al.

Streaming process mining deals with the real-time analysis of streaming data. Event streams require algorithms capable of processing data incrementally. To systematically address the complexities of this domain, we propose AVOCADO, a standardized challenge framework that provides clear structural divisions: separating the concept and instantiation layers of challenges in streaming process mining for algorithm evaluation. The AVOCADO evaluates algorithms on streaming-specific metrics like accuracy, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Processing Latency, and robustness. This initiative seeks to foster innovation and community-driven discussions to advance the field of streaming process mining. We present this framework as a foundation and invite the community to contribute to its evolution by suggesting new challenges, such as integrating metrics for system throughput and memory consumption, and expanding the scope to address real-world stream complexities like out-of-order event arrival.

7.0SEApr 19
Technology Research Software: An Often Overlooked Category of Research Software

Wilhelm Hasselbring, Daniel S. Katz, Rob van Nieuwpoort

Research software has been categorized for various goals. One fundamental dimension of such categorizations is the role that the software plays in the research process. Recently, a new role category has emerged: technology research software, which covers research software developed in technology research. Until now, this category of technology research software has often been overlooked and neglected within the research software engineering community. In this article, we explain technology research software and its primary subroles. Technology readiness levels are an established method of estimating the maturity of technologies, including software systems. For technology research software, these readiness levels define secondary subroles. To illustrate the concept of technology research software and to make it more tangible, we present examples of research software that, depending on its specific use within or outside of research, take on the role of technology research software as well as that of another research software category.

SESep 29, 2021Code
Live Visualization of Dynamic Software Cities with Heat Map Overlays

Alexander Krause, Malte Hansen, Wilhelm Hasselbring

The 3D city metaphor in software visualization is a well-explored rendering method. Numerous tools use their custom variation to visualize offline-analyzed data. Heat map overlays are one of these variants. They introduce a separate information layer in addition to the software city's own semantics. Results show that their usage facilitates program comprehension. In this paper, we present our heat map approach for the city metaphor visualization based on live trace analysis. In comparison to previous approaches, our implementation uses live dynamic analysis of a software system's runtime behavior. At any time, users can toggle the heat map feature and choose which runtime-dependent metric the heat map should visualize. Our approach continuously and automatically renders both software cities and heat maps. It does not require a manual or semi-automatic generation of heat maps and seamlessly blends into the overall software visualization. We implemented this approach in our web-based tool ExplorViz, such that the heat map overlay is also available in our augmented reality environment. ExplorViz is developed as open source software and is continuously published via Docker images. A live demo of ExplorViz is publicly available.

SESep 22, 2020Code
Goals and Measures for Analyzing Power Consumption Data in Manufacturing Enterprises

Sören Henning, Wilhelm Hasselbring, Heinz Burmester et al.

The Internet of Things adoption in the manufacturing industry allows enterprises to monitor their electrical power consumption in real time and at machine level. In this paper, we follow up on such emerging opportunities for data acquisition and show that analyzing power consumption in manufacturing enterprises can serve a variety of purposes. Apart from the prevalent goal of reducing overall power consumption for economical and ecological reasons, such data can, for example, be used to improve production processes. Based on a literature review and expert interviews, we discuss how analyzing power consumption data can serve the goals reporting, optimization, fault detection, and predictive maintenance. To tackle these goals, we propose to implement the measures real-time data processing, multi-level monitoring, temporal aggregation, correlation, anomaly detection, forecasting, visualization, and alerting in software. We transfer our findings to two manufacturing enterprises and show how the presented goals reflect in these enterprises. In a pilot implementation of a power consumption analytics platform, we show how our proposed measures can be implemented with a microservice-based architecture, stream processing techniques, and the fog computing paradigm. We provide the implementations as open source as well as a public demo allowing to reproduce and extend our research.

SEAug 16, 2019Code
FAIR and Open Computer Science Research Software

Wilhelm Hasselbring, Leslie Carr, Simon Hettrick et al.

In computational science and in computer science, research software is a central asset for research. Computational science is the application of computer science and software engineering principles to solving scientific problems, whereas computer science is the study of computer hardware and software design. The Open Science agenda holds that science advances faster when we can build on existing results. Therefore, research software has to be reusable for advancing science. Thus, we need proper research software engineering for obtaining reusable and sustainable research software. This way, software engineering methods may improve research in other disciplines. However, research in software engineering and computer science itself will also benefit from reuse when research software is involved. For good scientific practice, the resulting research software should be open and adhere to the FAIR principles (findable, accessible, interoperable and repeatable) to allow repeatability, reproducibility, and reuse. Compared to research data, research software should be both archived for reproducibility and actively maintained for reusability. The FAIR data principles do not require openness, but research software should be open source software. Established open source software licenses provide sufficient licensing options, such that it should be the rare exception to keep research software closed. We review and analyze the current state in this area in order to give recommendations for making computer science research software FAIR and open. We observe that research software publishing practices in computer science and in computational science show significant differences.

SEJul 12, 2019Code
Modularization of Research Software for Collaborative Open Source Development

Christian Zirkelbach, Alexander Krause, Wilhelm Hasselbring

Software systems evolve over their lifetime. Changing conditions, such as requirements or customer requests make it inevitable for developers to perform adjustments to the underlying code base. Especially in the context of open source software where everybody can contribute, requirements can change over time and new user groups may be addressed. In particular, research software is often not structured with a maintainable and extensible architecture. In combination with obsolescent technologies, this is a challenging task for new developers, especially, when students are involved. In this paper, we report on the modularization process and architecture of our open source research project ExplorViz towards a microservice architecture. The new architecture facilitates a collaborative development process for both researchers and students. We describe the modularization measures and present how we solved occurring issues and enhanced our development process. Afterwards, we illustrate our modularization approach with our modernized, extensible software system architecture and highlight the improved collaborative development process. Finally, we present a proof-of-concept implementation featuring several developed extensions in terms of architecture and extensibility.

14.1DBApr 1
Know Your Streams: On the Conceptualization, Characterization, and Generation of Intentional Event Streams

Andrea Maldonado, Christian Imenkamp, Hendrik Reiter et al.

The shift toward IoT-enabled, sensor-driven systems has transformed how operational data is generated, favoring continuous, real-time event streams (ES) over static event logs. This evolution presents new challenges for Streaming Process Mining (SPM), which must cope with out-of-order events, concurrent activities, incomplete cases, and concept drifts. Yet, the evaluation of SPM algorithms remains rooted in outdated practices, relying on static logs or artificially streamified data that fail to reflect the complexities of real-world streams. To address this gap, we first perform a comprehensive review of data stream literature to identify stream characteristics currently not reflected in the SPM community. Next, we use this information to extend the conceptual foundation for ES. Finally, we propose Stream of Intent, a prototype generator to produce ES with specific features. Our evaluation shows excellence in producing reproducible, intentional ES for targeted benchmarking and adaptive algorithm development in SPM.

SEFeb 1, 2022
Thematic Domain Analysis for Ocean Modeling

Reiner Jung, Sven Gundlach, Wilhelm Hasselbring

Ocean science is a discipline that employs ocean models as an essential research asset. Such scientific modeling provides mathematical abstractions of real-world systems, e.g., the oceans. These models are then coded as implementations of the mathematical abstractions. The developed software systems are called models of the real-world system. To advance the state in engineering such ocean models, we intend to better understand how ocean models are developed and maintained in ocean science. In this paper, we present the results of semi-structured interviews and the Thematic Analysis~(TA) of the interview results to analyze the domain of ocean modeling. Thereby, we identified developer requirements and impediments to model development and evolution, and related themes. This analysis can help to understand where methods from software engineering should be introduced and which challenges need to be addressed. We suggest that other researchers extend and repeat our TA with model developers and research software engineers working in related domains to further advance our knowledge and skills in scientific modeling.

SEOct 20, 2021
JavaBERT: Training a transformer-based model for the Java programming language

Nelson Tavares de Sousa, Wilhelm Hasselbring

Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing out the potential benefits of its application. Natural language processing has shown the potential to process text data regarding a variety of tasks. We argue, that such models can also show similar benefits for software code processing. In this paper, we investigate how models used for natural language processing can be trained upon software code. We introduce a data retrieval pipeline for software code and train a model upon Java software code. The resulting model, JavaBERT, shows a high accuracy on the masked language modeling task showing its potential for software engineering tools.

SEAug 19, 2021
Software Development Processes in Ocean System Modeling

Reiner Jung, Sven Gundlach, Wilhelm Hasselbring

Scientific modeling provides mathematical abstractions of real-world systems and builds software as implementations of these mathematical abstractions. Ocean science is a multidisciplinary discipline developing scientific models and simulations as ocean system models that are an essential research asset. In software engineering and information systems research, modeling is also an essential activity. In particular, business process modeling for business process management and systems engineering is the activity of representing processes of an enterprise, so that the current process may be analyzed, improved, and automated. In this paper, we employ process modeling for analyzing scientific software development in ocean science to advance the state in engineering of ocean system models and to better understand how ocean system models are developed and maintained in ocean science. We interviewed domain experts in semi-structured interviews, analyzed the results via thematic analysis, and modeled the results via the business process modeling notation BPMN. The processes modeled as a result describe an aspired state of software development in the domain, which are often not (yet) implemented. This enables existing processes in simulation-based system engineering to be improved with the help of these process models.

SEAug 18, 2021
Control Flow Versus Data Flow in Distributed Systems Integration: Revival of Flow-Based Programming for the Industrial Internet of Things

Wilhelm Hasselbring, Maik Wojcieszak, Schahram Dustdar

When we consider the application layer of networked infrastructures, data and control flow are important concerns in distributed systems integration. Modularity is a fundamental principle in software design, in particular for distributed system architectures. Modularity emphasizes high cohesion of individual modules and low coupling between modules. Microservices are a recent modularization approach with the specific requirements of independent deployability and, in particular, decentralized data management. Cohesiveness of microservices goes hand-in-hand with loose coupling, making the development, deployment, and evolution of microservice architectures flexible and scalable. However, in our experience with microservice architectures, interactions and flows among microservices are usually more complex than in traditional, monolithic enterprise systems, since services tend to be smaller and only have one responsibility, causing collaboration needs. We suggest that for loose coupling among microservices, explicit control-flow modeling and execution with central workflow engines should be avoided on the application integration level. On the level of integrating microservices, data-flow modeling should be dominant. Control-flow should be secondary and preferably delegated to the microservices. We discuss coupling in distributed systems integration and reflect the history of business process modeling with respect to data and control flow. To illustrate our recommendations, we present some results for flow-based programming in our Industrial DevOps project Titan, where we employ flow-based programming for the Industrial Internet of Things.

SEMay 1, 2021
Benchmarking as Empirical Standard in Software Engineering Research

Wilhelm Hasselbring

In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for benchmarking. In this paper, we discuss benchmarks for software performance and scalability evaluation as example research areas in software engineering, relate benchmarks to some other empirical research methods, and discuss the requirements on benchmarks that may constitute the basis for a checklist of a benchmarking standard for empirical software engineering research.

SEMar 21, 2021
Continuous API Evolution in Heterogenous Enterprise Software Systems

Holger Knoche, Wilhelm Hasselbring

The ability to independently deploy parts of a software system is one of the cornerstones of modern software development, and allows for these parts to evolve independently and at different speeds. A major challenge of such independent deployment, however, is to ensure that despite their individual evolution, the interfaces between interacting parts remain compatible. This is especially important for enterprise software systems, which are often highly integrated and based on heterogenous IT infrastructures. Although several approaches for interface evolution have been proposed, many of these rely on the developer to adhere to certain rules, but provide little guidance for doing so. In this paper, we present an approach for interface evolution that is easy to use for developers, and also addresses typical challenges of heterogenous enterprise software, especially legacy system integration.

SEMar 17, 2021
Towards Automated Metamorphic Test Identification for Ocean System Models

Dilip Jagadeeshwarswamy Hiremath, Martin Claus, Wilhelm Hasselbring et al.

Metamorphic testing seeks to verify software in the absence of test oracles. Our application domain is ocean system modeling, where test oracles rarely exist, but where symmetries of the simulated physical systems are known. The input data set is large owing to the requirements of the application domain. This paper presents work in progress for the automated generation of metamorphic test scenarios using machine learning. We extended our previously proposed method [1] to identify metamorphic relations with reduced computational complexity. Initially, we represent metamorphic relations as identity maps. We construct a cost function that minimizes for identifying a metamorphic relation orthogonal to previously found metamorphic relations and penalize for the identity map. A machine learning algorithm is used to identify all possible metamorphic relations minimizing the defined cost function. We propose applying dimensionality reduction techniques to identify attributes in the input which have high variance among the identified metamorphic relations. We apply mutation on these selected attributes to identify distinct metamorphic relations with reduced computational complexity. For experimental evaluation, we subject the two implementations of an ocean-modeling application to the proposed method to present the use of metamorphic relations to test the two implementations of this application.

SEMar 16, 2021
Prototyping Autonomous Robotic Networks on Different Layers of RAMI 4.0 with Digital Twins

Alexander Barbie, Wilhelm Hasselbring, Niklas Pech et al.

In this decade, the amount of (industrial) Internet of Things devices will increase tremendously. Today, there exist no common standards for interconnection, observation, or the monitoring of these devices. In context of the German "Industrie 4.0" strategy the Reference Architectural Model Industry 4.0 (RAMI 4.0) was introduced to connect different aspects of this rapid development. The idea is to let different stakeholders of these products speak and understand the same terminology. In this paper, we present an approach using Digital Twins to prototype different layers along the axis of the RAMI 4.0, by the example of an autonomous ocean observation system developed in the project ARCHES.

SEMar 15, 2021
Developing an Underwater Network of Ocean Observation Systems with Digital Twin Prototypes -- A Field Report from the Baltic Sea

Alexander Barbie, Niklas Pech, Wilhelm Hasselbring et al.

During the research cruise AL547 with RV ALKOR (October 20-31, 2020), a collaborative underwater network of ocean observation systems was deployed in Boknis Eck (SW Baltic Sea, German exclusive economic zone (EEZ)) in the context of the project ARCHES (Autonomous Robotic Networks to Help Modern Societies). This network was realized via a Digital Twin Prototype approach. During that period different scenarios were executed to demonstrate the feasibility of Digital Twins in an extreme environment such as underwater. One of the scenarios showed the collaboration of stage IV Digital Twins with their physical counterparts on the seafloor. This way, we address the research question, whether Digital Twins represent a feasible approach to operate mobile ad hoc networks for ocean and coastal observation.

SESep 3, 2020
Automated identification of metamorphic test scenarios for an ocean-modeling application

Dilip J. Hiremath, Martin Claus, Wilhelm Hasselbring et al.

Metamorphic testing seeks to validate software in the absence of test oracles. Our application domain is ocean modeling, where test oracles often do not exist, but where symmetries of the simulated physical systems are known. In this short paper we present work in progress for automated generation of metamorphic test scenarios using machine learning. Metamorphic testing may be expressed as f(g(X))=h(f(X)) with f being the application under test, with input data X, and with the metamorphic relation (g, h). Automatically generated metamorphic relations can be used for constructing regression tests, and for comparing different versions of the same software application. Here, we restrict to h being the identity map. Then, the task of constructing tests means finding different g which we tackle using machine learning algorithms. These algorithms typically minimize a cost function. As one possible g is already known to be the identity map, for finding a second possible g, we construct the cost function to minimize for g being a metamorphic relation and to penalize for g being the identity map. After identifying the first metamorphic relation, the procedure is repeated with a cost function rewarding g that are orthogonal to previously found metamorphic relations. For experimental evaluation, two implementations of an ocean-modeling application will be subjected to the proposed method with the objective of presenting the use of metamorphic relations to test the implementations of the applications.

SESep 1, 2020
Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

Sören Henning, Wilhelm Hasselbring

Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' and Flink's scalability for different deployment options.

SEMar 5, 2020
Microservice Decomposition via Static and Dynamic Analysis of the Monolith

Alexander Krause, Christian Zirkelbach, Wilhelm Hasselbring et al.

Migrating monolithic software systems into microservices requires the application of decomposition techniquesto find and select appropriate service boundaries. These techniques are often based on domain knowledge, static code analysis, and non-functional requirements such as maintainability. In this paper, we present our experience with an approach that extends static analysis with dynamic analysis of a legacy software system's runtime behavior, including the live trace visualization to support the decomposition into microservices. Overall, our approach combines established analysis techniques for microservice decomposition, such as the bounded context pattern of domain-driven design, and enriches the collected information via dynamic software visualization to identify appropriate microservice boundaries. In collaboration with the German IT service provider adesso SE, we applied our approach to their real-word, legacy lottery application in|FOCUS to identify good microservice decompositions for this layered monolithic Enterprise Java system.

DCNov 15, 2019
Scalable and Reliable Multi-Dimensional Aggregation of Sensor Data Streams

Sören Henning, Wilhelm Hasselbring

Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems being designed according to the concept of stream processing. A common area of application is the processing of continuous data streams from sensors, for example, IoT devices or performance monitoring tools. In addition to analyzing pure sensor data, analyses of data for groups of sensors often need to be performed as well. Therefore, data streams of the individual sensors have to be continuously aggregated to a data stream for a group. Motivated by a real-world application scenario, we propose that such a stream aggregation approach has to allow for aggregating sensors in hierarchical groups, support multiple such hierarchies in parallel, provide reconfiguration at runtime, and preserve the scalability and reliability qualities induced by applying stream processing techniques. We propose a stream processing architecture fulfilling these requirements, which can be integrated into existing big data architectures. We present a pilot implementation of such an extended architecture and show how it is used in industry. Furthermore, in experimental evaluations we show that our solution scales linearly with the amount of sensors and provides adequate reliability in the case of faults.

SESep 27, 2019
Comparing Static and Dynamic Weighted Software Coupling Metrics

Henning Schnoor, Wilhelm Hasselbring

Coupling metrics are an established way to measure software architecture quality with respect to modularity. Static coupling metrics are obtained from the source or compiled code of a program, while dynamic metrics use runtime data gathered e.g., by monitoring a system in production. We study \emph{weighted} dynamic coupling that takes into account how often a connection is executed during a system's run. We investigate the correlation between dynamic weighted metrics and their static counterparts. We use data collected from four different experiments, each monitoring production use of a commercial software system over a period of four weeks. We observe an unexpected level of correlation between the static and the weighted dynamic case as well as revealing differences between class- and package-level analyses.

SEJul 3, 2019
Industrial DevOps

Wilhelm Hasselbring, Sören Henning, Björn Latte et al.

The visions and ideas of Industry 4.0 require a profound interconnection of machines, plants, and IT systems in industrial production environments. This significantly increases the importance of software, which is coincidentally one of the main obstacles to the introduction of Industry 4.0. Lack of experience and knowledge, high investment and maintenance costs, as well as uncertainty about future developments cause many small and medium-sized enterprises hesitating to adopt Industry 4.0 solutions. We propose Industrial DevOps as an approach to introduce methods and culture of DevOps into industrial production environments. The fundamental concept of this approach is a continuous process of operation, observation, and development of the entire production environment. This way, all stakeholders, systems, and data can thus be integrated via incremental steps and adjustments can be made quickly. Furthermore, we present the Titan software platform accompanied by a role model for integrating production environments with Industrial DevOps. In two initial industrial application scenarios, we address the challenges of energy management and predictive maintenance with the methods, organizational structures, and tools of Industrial DevOps.

SEJul 1, 2019
A Scalable Architecture for Power Consumption Monitoring in Industrial Production Environments

Sören Henning, Wilhelm Hasselbring, Armin Möbius

Detailed knowledge about the electrical power consumption in industrial production environments is a prerequisite to reduce and optimize their power consumption. Today's industrial production sites are equipped with a variety of sensors that, inter alia, monitor electrical power consumption in detail. However, these environments often lack an automated data collation and analysis. We present a system architecture that integrates different sensors and analyzes and visualizes the power consumption of devices, machines, and production plants. It is designed with a focus on scalability to support production environments of various sizes and to handle varying loads. We argue that a scalable architecture in this context must meet requirements for fault tolerance, extensibility, real-time data processing, and resource efficiency. As a solution, we propose a microservice-based architecture augmented by big data and stream processing techniques. Applying the fog computing paradigm, parts of it are deployed in an elastic, central cloud while other parts run directly, decentralized in the production environment. A prototype implementation of this architecture presents solutions how different kinds of sensors can be integrated and their measurements can be continuously aggregated. In order to make analyzed data comprehensible, it features a single-page web application that provides different forms of data visualization. We deploy this pilot implementation in the data center of a medium-sized enterprise, where we successfully monitor the power consumption of 16~servers. Furthermore, we show the scalability of our architecture with 20,000~simulated sensors.

SEAug 18, 2015
Performance-oriented DevOps: A Research Agenda

Andreas Brunnert, Andre van Hoorn, Felix Willnecker et al.

DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functional perspective. In order to integrate a non-functional perspective into these DevOps concepts this report focuses on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance. Performance describes system properties concerning its timeliness and use of resources. Common metrics are response time, throughput, and resource utilization. Performance goals for EAs are typically defined by setting upper and/or lower bounds for these metrics and specific business transactions. In order to ensure that such performance goals can be met, several activities are required during development and operation of these systems as well as during the transition from Dev to Ops. Activities during development are typically summarized by the term Software Performance Engineering (SPE), whereas activities during operations are called Application Performance Management (APM). SPE and APM were historically tackled independently from each other, but the newly emerging DevOps concepts require and enable a tighter integration between both activity streams. This report presents existing solutions to support this integration as well as open research challenges in this area.