LGAug 6, 2023

Communication-Free Distributed GNN Training with Vertex Cut

Kaidi Cao, Rui Deng, Shirley Wu, Edward W Huang, Karthik Subbian, Jure Leskovec

Stanford

arXiv:2308.03209v16.64 citationsh-index: 148

Originality Highly original

AI Analysis

This addresses the scalability problem for researchers and practitioners dealing with billion-scale graphs, offering a significant performance improvement over existing distributed training approaches.

The paper tackles the challenge of training Graph Neural Networks (GNNs) on large graphs by introducing CoFree-GNN, a distributed framework that eliminates cross-GPU communication through Vertex Cut partitioning, achieving up to 10 times speedup over state-of-the-art methods.

Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are then distributed across multiple GPUs in one or more machines and processed in parallel. However, existing distributed methods require frequent and substantial cross-GPU communication, leading to significant time overhead and progressively diminishing scalability. Here, we introduce CoFree-GNN, a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. The framework utilizes a Vertex Cut partitioning, i.e., rather than partitioning the graph by cutting the edges between partitions, the Vertex Cut partitions the edges and duplicates the node information to preserve the graph structure. Furthermore, the framework maintains high model accuracy by incorporating a reweighting mechanism to handle a distorted graph distribution that arises from the duplicated nodes. We also propose a modified DropEdge technique to further speed up the training process. Using an extensive set of experiments on real-world networks, we demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.

View on arXiv PDF

Similar