LGJun 25, 2025

Demystifying Distributed Training of Graph Neural Networks for Link Prediction

arXiv:2506.20818v11 citationsh-index: 6Has CodeICDCS
Originality Incremental advance
AI Analysis

This addresses scalability issues for researchers and practitioners using GNNs in link prediction tasks, but it is incremental as it builds on existing distributed frameworks.

The paper tackled performance degradation in distributed training of graph neural networks for link prediction, caused by graph partitioning and negative sampling, and proposed SpLPG, which reduced communication overhead by up to 80% while mostly preserving accuracy.

Graph neural networks (GNNs) are powerful tools for solving graph-related problems. Distributed GNN frameworks and systems enhance the scalability of GNNs and accelerate model training, yet most are optimized for node classification. Their performance on link prediction remains underexplored. This paper demystifies distributed training of GNNs for link prediction by investigating the issue of performance degradation when each worker trains a GNN on its assigned partitioned subgraph without having access to the entire graph. We discover that the main sources of the issue come from not only the information loss caused by graph partitioning but also the ways of drawing negative samples during model training. While sharing the complete graph information with each worker resolves the issue and preserves link prediction accuracy, it incurs a high communication cost. We propose SpLPG, which effectively leverages graph sparsification to mitigate the issue of performance degradation at a reduced communication cost. Experiment results on several public real-world datasets demonstrate the effectiveness of SpLPG, which reduces the communication overhead by up to about 80% while mostly preserving link prediction accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes