LG AI MLJun 14, 2025

Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

Suyeon Kim, SeongKu Kang, Dongwoo Kim, Jungseul Ok, Hwanjo Yu

arXiv:2506.12468v211.43 citationsh-index: 17Has CodeKDD

Originality Synthesis-oriented

AI Analysis

This addresses the issue of label noise in graph neural networks for researchers and practitioners, but it is incremental as it focuses on benchmarking rather than proposing a new method.

The paper tackles the problem of instance-dependent label noise in graph data, which existing studies overlook, by introducing the BeGIN benchmark that simulates realistic noise and evaluates noise-handling strategies, revealing challenges like LLM-based corruption and the need for node-specific parameterization to improve GNN robustness.

Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data. Existing studies on graph learning with label noise commonly rely on class-dependent label noise, overlooking the complexities of instance-dependent noise and falling short of capturing real-world corruption patterns. We introduce BeGIN (Benchmarking for Graphs with Instance-dependent Noise), a new benchmark that provides realistic graph datasets with various noise types and comprehensively evaluates noise-handling strategies across GNN architectures, noisy label detection, and noise-robust learning. To simulate instance-dependent corruptions, BeGIN introduces algorithmic methods and LLM-based simulations. Our experiments reveal the challenges of instance-dependent noise, particularly LLM-based corruption, and underscore the importance of node-specific parameterization to enhance GNN robustness. By comprehensively evaluating noise-handling strategies, BeGIN provides insights into their effectiveness, efficiency, and key performance factors. We expect that BeGIN will serve as a valuable resource for advancing research on label noise in graphs and fostering the development of robust GNN training methods. The code is available at https://github.com/kimsu55/BeGIN.

View on arXiv PDF Code

Similar