Rethinking Reconstruction-based Graph-Level Anomaly Detection: Limitations and a Simple Remedy
This addresses a specific issue in graph anomaly detection for researchers and practitioners, offering an incremental improvement over existing methods.
The paper tackled the problem of graph-level anomaly detection using graph autoencoders by identifying limitations in their reconstruction-based approach, such as the reconstruction flip phenomenon, and proposed a simple method called MUSE that uses multifaceted summaries of reconstruction errors, achieving state-of-the-art performance as the best overall among 14 methods across 10 datasets.
Graph autoencoders (Graph-AEs) learn representations of given graphs by aiming to accurately reconstruct them. A notable application of Graph-AEs is graph-level anomaly detection (GLAD), whose objective is to identify graphs with anomalous topological structures and/or node features compared to the majority of the graph population. Graph-AEs for GLAD regard a graph with a high mean reconstruction error (i.e. mean of errors from all node pairs and/or nodes) as anomalies. Namely, the methods rest on the assumption that they would better reconstruct graphs with similar characteristics to the majority. We, however, report non-trivial counter-examples, a phenomenon we call reconstruction flip, and highlight the limitations of the existing Graph-AE-based GLAD methods. Specifically, we empirically and theoretically investigate when this assumption holds and when it fails. Through our analyses, we further argue that, while the reconstruction errors for a given graph are effective features for GLAD, leveraging the multifaceted summaries of the reconstruction errors, beyond just mean, can further strengthen the features. Thus, we propose a novel and simple GLAD method, named MUSE. The key innovation of MUSE involves taking multifaceted summaries of reconstruction errors as graph features for GLAD. This surprisingly simple method obtains SOTA performance in GLAD, performing best overall among 14 methods across 10 datasets.