LGDCMay 5, 2021

Scalable Graph Neural Network Training: The Case for Sampling

arXiv:2105.02315v133 citations
Originality Synthesis-oriented
AI Analysis

This addresses scalability issues for researchers and practitioners working with large-scale graph data, but it is incremental as it reviews and compares existing approaches.

The paper tackles the challenge of efficiently training Graph Neural Networks (GNNs) on large graphs that exceed single-device capacity, arguing that sample-based training is a more promising approach than whole-graph training for scalability.

Graph Neural Networks (GNNs) are a new and increasingly popular family of deep neural network architectures to perform learning on graphs. Training them efficiently is challenging due to the irregular nature of graph data. The problem becomes even more challenging when scaling to large graphs that exceed the capacity of single devices. Standard approaches to distributed DNN training, such as data and model parallelism, do not directly apply to GNNs. Instead, two different approaches have emerged in the literature: whole-graph and sample-based training. In this paper, we review and compare the two approaches. Scalability is challenging with both approaches, but we make a case that research should focus on sample-based training since it is a more promising approach. Finally, we review recent systems supporting sample-based training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes