LG DC MLMay 7, 2020

Reducing Communication in Graph Neural Network Training

Alok Tripathy, Katherine Yelick, Aydin Buluc

arXiv:2005.03300v321.7123 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the scalability problem for researchers and practitioners using GNNs on large graphs, representing an incremental improvement in parallel training efficiency.

The paper tackles the high communication costs in scaling Graph Neural Network (GNN) training by introducing parallel algorithms that asymptotically reduce communication compared to previous methods, achieving training on over a hundred GPUs for datasets like a protein network with over a billion edges.

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1.5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.

View on arXiv PDF Code

Similar