LGApr 10

Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection

arXiv:2604.0916634.2h-index: 18
AI Analysis

This provides a unique dataset for simulation-to-experiment transfer and deep anomaly detection research in chemical process monitoring, though it is incremental as it builds on prior experimental work.

The authors tackled the lack of large, annotated datasets for deep anomaly detection in chemical processes by augmenting an existing experimental dataset with a simulation dataset, creating a hybrid dataset that covers normal and anomalous operations. They developed an automated Python-based simulator that accurately predicts dynamics after calibration, enabling consistent generation of time-series data for numerous runs.

Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes