DGI: Easy and Efficient Inference for GNNs
This addresses a critical bottleneck in deploying GNNs for practitioners by making inference efficient and easy to implement, though it is incremental as it builds on existing layer-wise inference methods.
The paper tackles the problem of inefficient Graph Neural Network (GNN) inference, which can account for up to 94% of training time due to neighbor explosion, by developing DGI, a system that automates layer-wise execution from training code, achieving speedups of over 1,000x in experiments.
While many systems have been developed to train Graph Neural Networks (GNNs), efficient model inference and evaluation remain to be addressed. For instance, using the widely adopted node-wise approach, model evaluation can account for up to 94% of the time in the end-to-end training process due to neighbor explosion, which means that a node accesses its multi-hop neighbors. On the other hand, layer-wise inference avoids the neighbor explosion problem by conducting inference layer by layer such that the nodes only need their one-hop neighbors in each layer. However, implementing layer-wise inference requires substantial engineering efforts because users need to manually decompose a GNN model into layers for computation and split workload into batches to fit into device memory. In this paper, we develop Deep Graph Inference (DGI) -- a system for easy and efficient GNN model inference, which automatically translates the training code of a GNN model for layer-wise execution. DGI is general for various GNN models and different kinds of inference requests, and supports out-of-core execution on large graphs that cannot fit in CPU memory. Experimental results show that DGI consistently outperforms layer-wise inference across different datasets and hardware settings, and the speedup can be over 1,000x.