Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training
This addresses scalability issues for researchers and practitioners working with large-scale graph data, representing a strong specific gain rather than a foundational breakthrough.
The paper tackles the problem of training graph neural networks (GNNs) on billion-edge graphs, which exceed GPU memory and suffer from communication overheads in distributed settings, by proposing a 3D parallel full-graph training approach called Plexus, achieving speedups of 2.3-12.5x over prior state-of-the-art and reducing time-to-solution by up to 54.2x.
Graph neural networks (GNNs) leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and training GNNs on such graphs requires techniques such as mini-batch sampling to scale. The alternative approach of distributed full-graph training suffers from high communication overheads and load imbalance due to the irregular structure of graphs. We propose a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs. In addition, we introduce optimizations such as a double permutation scheme for load balancing, and a performance model to predict the optimal 3D configuration of our parallel implementation -- Plexus. We evaluate Plexus on six different graph datasets and show scaling results on up to 2048 GPUs of Perlmutter, and 1024 GPUs of Frontier. Plexus achieves unprecedented speedups of 2.3-12.5x over prior state of the art, and a reduction in time-to-solution by 5.2-8.7x on Perlmutter and 7.0-54.2x on Frontier.