Heike Leitte

LG
h-index23
12papers
122citations
Novelty32%
AI Score38

12 Papers

LGMar 10, 2023
Deep Anomaly Detection on Tennessee Eastman Process Data

Fabian Hartung, Billy Joe Franks, Tobias Michels et al.

This paper provides the first comprehensive evaluation and analysis of modern (deep-learning) unsupervised anomaly detection methods for chemical process data. We focus on the Tennessee Eastman process dataset, which has been a standard litmus test to benchmark anomaly detection methods for nearly three decades. Our extensive study will facilitate choosing appropriate anomaly detection methods in industrial applications.

LGOct 20, 2025
Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods

Justus Arweiler, Indra Jungjohann, Aparna Muraleedharan et al.

Machine learning (ML) holds great potential to advance anomaly detection (AD) in chemical processes. However, the development of ML-based methods is hindered by the lack of openly available experimental data. To address this gap, we have set up a laboratory-scale batch distillation plant and operated it to generate an extensive experimental database, covering fault-free experiments and experiments in which anomalies were intentionally induced, for training advanced ML-based AD methods. In total, 119 experiments were conducted across a wide range of operating conditions and mixtures. Most experiments containing anomalies were paired with a corresponding fault-free one. The database that we provide here includes time-series data from numerous sensors and actuators, along with estimates of measurement uncertainty. In addition, unconventional data sources -- such as concentration profiles obtained via online benchtop NMR spectroscopy and video and audio recordings -- are provided. Extensive metadata and expert annotations of all experiments are included. The anomaly annotations are based on an ontology developed in this work. The data are organized in a structured database and made freely available via doi.org/10.5281/zenodo.17395544. This new database paves the way for the development of advanced ML-based AD methods. As it includes information on the causes of anomalies, it further enables the development of interpretable and explainable ML approaches, as well as methods for anomaly mitigation.

LGOct 20, 2025
Formally Exploring Time-Series Anomaly Detection Evaluation Metrics

Dennis Wagner, Arjun Nair, Billy Joe Franks et al.

Undetected anomalies in time series can trigger catastrophic failures in safety-critical systems, such as chemical plant explosions or power grid outages. Although many detection methods have been proposed, their performance remains unclear because current metrics capture only narrow aspects of the task and often yield misleading results. We address this issue by introducing verifiable properties that formalize essential requirements for evaluating time-series anomaly detection. These properties enable a theoretical framework that supports principled evaluations and reliable comparisons. Analyzing 37 widely used metrics, we show that most satisfy only a few properties, and none satisfy all, explaining persistent inconsistencies in prior results. To close this gap, we propose LARM, a flexible metric that provably satisfies all properties, and extend it to ALARM, an advanced variant meeting stricter requirements.

CEJun 11, 2025
Superstudent intelligence in thermodynamics

Rebecca Loubet, Pascal Zittlau, Marco Hoffmann et al.

In this short note, we report and analyze a striking event: OpenAI's large language model o3 has outwitted all students in a university exam on thermodynamics. The thermodynamics exam is a difficult hurdle for most students, where they must show that they have mastered the fundamentals of this important topic. Consequently, the failure rates are very high, A-grades are rare - and they are considered proof of the students' exceptional intellectual abilities. This is because pattern learning does not help in the exam. The problems can only be solved by knowledgeably and creatively combining principles of thermodynamics. We have given our latest thermodynamics exam not only to the students but also to OpenAI's most powerful reasoning model, o3, and have assessed the answers of o3 exactly the same way as those of the students. In zero-shot mode, the model o3 solved all problems correctly, better than all students who took the exam; its overall score was in the range of the best scores we have seen in more than 10,000 similar exams since 1985. This is a turning point: machines now excel in complex tasks, usually taken as proof of human intellectual capabilities. We discuss the consequences this has for the work of engineers and the education of future engineers.

NAApr 16, 2024
Machine Learning Based Optimization Workflow for Tuning Numerical Settings of Differential Equation Solvers for Boundary Value Problems

Viny Saajan Victor, Manuel Ettmüller, Andre Schmeißer et al.

Several numerical differential equation solvers have been employed effectively over the years as an alternative to analytical solvers to quickly and conveniently solve differential equations. One category of these is boundary value solvers, which are used to solve real-world problems formulated as differential equations with boundary conditions. These solvers require certain numerical settings to solve the differential equations that affect their solvability and performance. A systematic fine-tuning of these settings is required to obtain the desired solution and performance. Currently, these settings are either selected by trial and error or require domain expertise. In this paper, we propose a machine learning-based optimization workflow for fine-tuning the numerical settings to reduce the time and domain expertise required in the process. In the evaluation section, we discuss the scalability, stability, and reliability of the proposed workflow. We demonstrate our workflow on a numerical boundary value problem solver.

LGApr 15, 2024
Machine learning-based optimization workflow of the homogeneity of spunbond nonwovens with human validation

Viny Saajan Victor, Andre Schmeißer, Heike Leitte et al.

In the last ten years, the average annual growth rate of nonwoven production was 4%. In 2020 and 2021, nonwoven production has increased even further due to the huge demand for nonwoven products needed for protective clothing such as FFP2 masks to combat the COVID19 pandemic. Optimizing the production process is still a challenge due to its high nonlinearity. In this paper, we present a machine learning-based optimization workflow aimed at improving the homogeneity of spunbond nonwovens. The optimization workflow is based on a mathematical model that simulates the microstructures of nonwovens. Based on trainingy data coming from this simulator, different machine learning algorithms are trained in order to find a surrogate model for the time-consuming simulator. Human validation is employed to verify the outputs of machine learning algorithms by assessing the aesthetics of the nonwovens. We include scientific and expert knowledge into the training data to reduce the computational costs involved in the optimization process. We demonstrate the necessity and effectiveness of our workflow in optimizing the homogeneity of nonwovens.

LGJul 28, 2021
Attribute-based Explanations of Non-Linear Embeddings of High-Dimensional Data

Jan-Tobias Sohns, Michaela Schmitt, Fabian Jirasek et al.

Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.

HCJul 23, 2021
Knowledge Rocks:Adding Knowledge Assistance to Visualization Systems

Anna-Pia Lohfink, Simon D. Duque Anton, Heike Leitte et al.

We present Knowledge Rocks, an implementation strategy and guideline for augmenting visualization systems to knowledge-assisted visualization systems, as defined by the KAVA model. Visualization systems become more and more sophisticated. Hence, it is increasingly important to support users with an integrated knowledge base in making constructive choices and drawing the right conclusions. We support the effective reactivation of visualization software resources by augmenting them with knowledge-assistance. To provide a general and yet supportive implementation strategy, we propose an implementation process that bases on an application-agnostic architecture. This architecture is derived from existing knowledge-assisted visualization systems and the KAVA model. Its centerpiece is an ontology that is able to automatically analyze and classify input data, linked to a database to store classified instances. We discuss design decisions and advantages of the KR framework and illustrate its broad area of application in diverse integration possibilities of this architecture into an existing visualization system. In addition, we provide a detailed case study by augmenting an it-security system with knowledge-assistance facilities.

CVDec 29, 2020
Visual Probing and Correction of Object Recognition Models with Interactive user feedback

Viny Saajan Victor, Pramod Vadiraja, Jan-Tobias Sohns et al.

With the advent of state-of-the-art machine learning and deep learning technologies, several industries are moving towards the field. Applications of such technologies are highly diverse ranging from natural language processing to computer vision. Object recognition is one such area in the computer vision domain. Although proven to perform with high accuracy, there are still areas where such models can be improved. This is in-fact highly important in real-world use cases like autonomous driving or cancer detection, that are highly sensitive and expect such technologies to have almost no uncertainties. In this paper, we attempt to visualise the uncertainties in object recognition models and propose a correction process via user feedback. We further demonstrate our approach on the data provided by the VAST 2020 Mini-Challenge 2.

CRDec 10, 2019
Security in Process: Visually Supported Triage Analysis in Industrial Process Data

Anna-Pia Lohfink, Simon D. Duque Anton, Hans Dieter Schotten et al.

Operation technology networks, i.e. hard- and software used for monitoring and controlling physical/industrial processes, have been considered immune to cyber attacks for a long time. A recent increase of attacks in these networks proves this assumption wrong. Several technical constraints lead to approaches to detect attacks on industrial processes using available sensor data. This setting differs fundamentally from anomaly detection in IT-network traffic and requires new visualization approaches adapted to the common periodical behavior in OT-network data. We present a tailored visualization system that utilizes inherent features of measurements from industrial processes to full capacity to provide insight into the data and support triage analysis by laymen and experts. The novel combination of spiral plots with results from anomaly detection was implemented in an interactive system. The capabilities of our system are demonstrated using sensor and actuator data from a real-world water treatment process with introduced attacks. Exemplary analysis strategies are presented. Finally, we evaluate effectiveness and usability of our system and perform an expert evaluation.

ATJul 31, 2019
Topological Machine Learning with Persistence Indicator Functions

Bastian Rieck, Filip Sadlo, Heike Leitte

Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.

ATJul 31, 2019
Persistent Intersection Homology for the Analysis of Discrete Data

Bastian Rieck, Markus Banagl, Filip Sadlo et al.

Topological data analysis is becoming increasingly relevant to support the analysis of unstructured data sets. A common assumption in data analysis is that the data set is a sample---not necessarily a uniform one---of some high-dimensional manifold. In such cases, persistent homology can be successfully employed to extract features, remove noise, and compare data sets. The underlying problems in some application domains, however, turn out to represent multiple manifolds with different dimensions. Algebraic topology typically analyzes such problems using intersection homology, an extension of homology that is capable of handling configurations with singularities. In this paper, we describe how the persistent variant of intersection homology can be used to assist data analysis in visualization. We point out potential pitfalls in approximating data sets with singularities and give strategies for resolving them.