LG SEFeb 25, 2021

TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services

arXiv:2102.12877v24.411 citations

Originality Incremental advance

AI Analysis

This addresses the need for automated anomaly remediation in AIOps for IT system operators, but it is incremental as it builds on existing graph neural network methods with a specific adaptation.

The paper tackles the problem of classifying recurring anomaly types in cloud services, which is challenging due to frequent system changes, by proposing TELESTO, a graph neural network model that is invariant to data dimensionality changes, achieving up to 85.1% classification accuracy in real-world tests.

Deployment, operation and maintenance of large IT systems becomes increasingly complex and puts human experts under extreme stress when problems occur. Therefore, utilization of machine learning (ML) and artificial intelligence (AI) is applied on IT system operation and maintenance - summarized in the term AIOps. One specific direction aims at the recognition of re-occurring anomaly types to enable remediation automation. However, due to IT system specific properties, especially their frequent changes (e.g. software updates, reconfiguration or hardware modernization), recognition of reoccurring anomaly types is challenging. Current methods mainly assume a static dimensionality of provided data. We propose a method that is invariant to dimensionality changes of given data. Resource metric data such as CPU utilization, allocated memory and others are modelled as multivariate time series. The extraction of temporal and spatial features together with the subsequent anomaly classification is realized by utilizing TELESTO, our novel graph convolutional neural network (GCNN) architecture. The experimental evaluation is conducted in a real-world cloud testbed deployment that is hosting two applications. Classification results of injected anomalies on a cassandra database node show that TELESTO outperforms the alternative GCNNs and achieves an overall classification accuracy of 85.1%. Classification results for the other nodes show accuracy values between 85% and 60%.

View on arXiv PDF

Similar