SI LGFeb 25, 2024

Towards Fair Graph Anomaly Detection: Problem, Benchmark Datasets, and Evaluation

Neng Kai Nigel Neo, Yeon-Chang Lee, Yiqiao Jin, Sang-Wook Kim, Srijan Kumar

arXiv:2402.15988v25.96 citationsh-index: 5Has CodeCIKM

Originality Synthesis-oriented

AI Analysis

This work addresses fairness in anomaly detection for social media users, but it is incremental as it primarily provides datasets and benchmarks rather than new methods.

The paper tackles the problem of fair graph anomaly detection by introducing a formal definition and two novel datasets from Reddit and Twitter with 1.2 million and 400,000 edges, respectively, and shows that these datasets differ significantly from synthetic ones, enabling investigation of performance-fairness trade-offs in existing methods.

The Fair Graph Anomaly Detection (FairGAD) problem aims to accurately detect anomalous nodes in an input graph while avoiding biased predictions against individuals from sensitive subgroups. However, the current literature does not comprehensively discuss this problem, nor does it provide realistic datasets that encompass actual graph structures, anomaly labels, and sensitive attributes. To bridge this gap, we introduce a formal definition of the FairGAD problem and present two novel datasets constructed from the social media platforms Reddit and Twitter. These datasets comprise 1.2 million and 400,000 edges associated with 9,000 and 47,000 nodes, respectively, and leverage political leanings as sensitive attributes and misinformation spreaders as anomaly labels. We demonstrate that our FairGAD datasets significantly differ from the synthetic datasets used by the research community. Using our datasets, we investigate the performance-fairness trade-off in nine existing GAD and non-graph AD methods on five state-of-the-art fairness methods. Our code and datasets are available at https://github.com/nigelnnk/FairGAD

View on arXiv PDF Code

Similar