AP LGJun 17, 2021

Pre-treatment of outliers and anomalies in plant data: Methodology and case study of a Vacuum Distillation Unit

Kamil Oster, Stefan Güttel, Jonathan L. Shapiro, Lu Chen, Megan Jobson

arXiv:2106.14641v12.32 citations

Originality Synthesis-oriented

AI Analysis

This addresses data quality issues for predictive modeling in refinery operations, but it is incremental as it adapts existing methods to a specific industrial context.

The paper tackled the problem of outlier detection in industrial plant data, showing that a piecewise 3σ method improves short-term outlier identification over the standard 3σ approach, and PCA with DBSCAN effectively handles long-term outliers in a vacuum distillation unit case study.

Data pre-treatment plays a significant role in improving data quality, thus allowing extraction of accurate information from raw data. One of the data pre-treatment techniques commonly used is outliers detection. The so-called 3$σ$ method is a common practice to identify the outliers. As shown in the manuscript, it does not identify all outliers, resulting in possible distortion of the overall statistics of the data. This problem can have a significant impact on further data analysis and can lead to reduction in the accuracy of predictive models. There is a plethora of various techniques for outliers detection, however, aside from theoretical work, they all require case study work. Two types of outliers were considered: short-term (erroneous data, noise) and long-term outliers (e.g. malfunctioning for longer periods). The data used were taken from the vacuum distillation unit (VDU) of an Asian refinery and included 40 physical sensors (temperature, pressure and flow rate). We used a modified method for 3$σ$ thresholds to identify the short-term outliers, i.e. ensors data are divided into chunks determined by change points and 3$σ$ thresholds are calculated within each chunk representing near-normal distribution. We have shown that piecewise 3$σ$ method offers a better approach to short-term outliers detection than 3$σ$ method applied to the entire time series. Nevertheless, this does not perform well for long-term outliers (which can represent another state in the data). In this case, we used principal component analysis (PCA) with Hotelling's $T^2$ statistics to identify the long-term outliers. The results obtained with PCA were subject to DBSCAN clustering method. The outliers (which were visually obvious and correctly detected by the PCA method) were also correctly identified by DBSCAN which supported the consistency and accuracy of the PCA method.

View on arXiv PDF

Similar