Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark
This addresses the issue of label noise in graph neural networks for researchers and practitioners, but it is incremental as it focuses on benchmarking rather than proposing a new method.
The paper tackles the problem of instance-dependent label noise in graph data, which existing studies overlook, by introducing the BeGIN benchmark that simulates realistic noise and evaluates noise-handling strategies, revealing challenges like LLM-based corruption and the need for node-specific parameterization to improve GNN robustness.
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in node classification tasks but struggle with label noise in real-world data. Existing studies on graph learning with label noise commonly rely on class-dependent label noise, overlooking the complexities of instance-dependent noise and falling short of capturing real-world corruption patterns. We introduce BeGIN (Benchmarking for Graphs with Instance-dependent Noise), a new benchmark that provides realistic graph datasets with various noise types and comprehensively evaluates noise-handling strategies across GNN architectures, noisy label detection, and noise-robust learning. To simulate instance-dependent corruptions, BeGIN introduces algorithmic methods and LLM-based simulations. Our experiments reveal the challenges of instance-dependent noise, particularly LLM-based corruption, and underscore the importance of node-specific parameterization to enhance GNN robustness. By comprehensively evaluating noise-handling strategies, BeGIN provides insights into their effectiveness, efficiency, and key performance factors. We expect that BeGIN will serve as a valuable resource for advancing research on label noise in graphs and fostering the development of robust GNN training methods. The code is available at https://github.com/kimsu55/BeGIN.