LGNov 7, 2023Code
Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron CalorimeterMulugeta Weldezgina Asres, Christian Walter Omlin, Long Wang et al.
The Compact Muon Solenoid (CMS) experiment is a general-purpose detector for high-energy collision at the Large Hadron Collider (LHC) at CERN. It employs an online data quality monitoring (DQM) system to promptly spot and diagnose particle data acquisition problems to avoid data quality loss. In this study, we present a semi-supervised spatio-temporal anomaly detection (AD) monitoring system for the physics particle reading channels of the Hadron Calorimeter (HCAL) of the CMS using three-dimensional digi-occupancy map data of the DQM. We propose the GraphSTAD system, which employs convolutional and graph neural networks to learn local spatial characteristics induced by particles traversing the detector and the global behavior owing to shared backend circuit connections and housing boxes of the channels, respectively. Recurrent neural networks capture the temporal evolution of the extracted spatial features. We validate the accuracy of the proposed AD system in capturing diverse channel fault types using the LHC collision data sets. The GraphSTAD system achieves production-level accuracy and is being integrated into the CMS core production system for real-time monitoring of the HCAL. We provide a quantitative performance comparison with alternative benchmark models to demonstrate the promising leverage of the presented system. Code: https://github.com/muleina/CMS_HCAL_ML_OnlineDQM .
LGAug 29, 2024Code
Data Quality Monitoring for the Hadron Calorimeters Using Transfer Learning for Anomaly DetectionMulugeta Weldezgina Asres, Christian Walter Omlin, Long Wang et al.
The proliferation of sensors brings an immense volume of spatio-temporal (ST) data in many domains, including monitoring, diagnostics, and prognostics applications. Data curation is a time-consuming process for a large volume of data, making it challenging and expensive to deploy data analytics platforms in new environments. Transfer learning (TL) mechanisms promise to mitigate data sparsity and model complexity by utilizing pre-trained models for a new task. Despite the triumph of TL in fields like computer vision and natural language processing, efforts on complex ST models for anomaly detection (AD) applications are limited. In this study, we present the potential of TL within the context of high-dimensional ST AD with a hybrid autoencoder architecture, incorporating convolutional, graph, and recurrent neural networks. Motivated by the need for improved model accuracy and robustness, particularly in scenarios with limited training data on systems with thousands of sensors, this research investigates the transferability of models trained on different sections of the Hadron Calorimeter of the Compact Muon Solenoid experiment at CERN. The key contributions of the study include exploring TL's potential and limitations within the context of encoder and decoder networks, revealing insights into model initialization and training configurations that enhance performance while substantially reducing trainable parameters and mitigating data contamination effects. Code: https://github.com/muleina/CMS\_HCAL\_ML\_OnlineDQM .
LGApr 12, 2024
Lightweight Multi-System Multivariate Interconnection and Divergence DiscoveryMulugeta Weldezgina Asres, Christian Walter Omlin, Jay Dittmann et al.
Identifying outlier behavior among sensors and subsystems is essential for discovering faults and facilitating diagnostics in large systems. At the same time, exploring large systems with numerous multivariate data sets is challenging. This study presents a lightweight interconnection and divergence discovery mechanism (LIDD) to identify abnormal behavior in multi-system environments. The approach employs a multivariate analysis technique that first estimates the similarity heatmaps among the sensors for each system and then applies information retrieval algorithms to provide relevant multi-level interconnection and discrepancy details. Our experiment on the readout systems of the Hadron Calorimeter of the Compact Muon Solenoid (CMS) experiment at CERN demonstrates the effectiveness of the proposed method. Our approach clusters readout systems and their sensors consistent with the expected calorimeter interconnection configurations, while capturing unusual behavior in divergent clusters and estimating their root causes.