LGJun 4, 2025

How to Use Graph Data in the Wild to Help Graph Anomaly Detection?

Yuxuan Cao, Jiarong Xu, Chen Zhao, Jiaan Wang, Carl Yang, Chunping Wang, Yang Yang

arXiv:2506.04190v111.44 citationsh-index: 15KDD

Originality Incremental advance

AI Analysis

This addresses label scarcity and data insufficiency in graph anomaly detection for domains like social and financial networks, offering a novel approach but with incremental improvements.

The paper tackles the problem of graph anomaly detection when insufficient data makes capturing normal distributions difficult, by proposing a framework that uses external graph data, resulting in an average 18% AUCROC and 32% AUCPR improvement over baseline methods.

In recent years, graph anomaly detection has found extensive applications in various domains such as social, financial, and communication networks. However, anomalies in graph-structured data present unique challenges, including label scarcity, ill-defined anomalies, and varying anomaly types, making supervised or semi-supervised methods unreliable. Researchers often adopt unsupervised approaches to address these challenges, assuming that anomalies deviate significantly from the normal data distribution. Yet, when the available data is insufficient, capturing the normal distribution accurately and comprehensively becomes difficult. To overcome this limitation, we propose to utilize external graph data (i.e., graph data in the wild) to help anomaly detection tasks. This naturally raises the question: How can we use external data to help graph anomaly detection tasks? To answer this question, we propose a framework called Wild-GAD. It is built upon a unified database, UniWildGraph, which comprises a large and diverse collection of graph data with broad domain coverage, ample data volume, and a unified feature space. Further, we develop selection criteria based on representativity and diversity to identify the most suitable external data for anomaly detection task. Extensive experiments on six real-world datasets demonstrate the effectiveness of Wild-GAD. Compared to the baseline methods, our framework has an average 18% AUCROC and 32% AUCPR improvement over the best-competing methods.

View on arXiv PDF

Similar