AIFeb 21, 2023

Label Information Enhanced Fraud Detection against Low Homophily in Graphs

arXiv:2302.10407v170 citationsh-index: 34
Originality Incremental advance
AI Analysis

This addresses fraud detection for financial or security applications, offering a significant performance boost in challenging low homophily scenarios, though it is incremental as it builds on existing GNN and label utilization techniques.

The paper tackles the problem of graph-based fraud detection in low homophily settings, where existing GNN methods struggle, by proposing GAGA, a novel method that integrates label information through group aggregation and learnable encodings, achieving up to 24.39% improvement over competitors on public and industrial datasets.

Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Anonymous. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes