TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
This work addresses performance issues in GNNs for domains like e-commerce, offering a significant speedup, though it is incremental as it builds on existing GPU hardware and frameworks.
The authors tackled the performance bottleneck of sparse and irregular graph-based operations in graph neural networks (GNNs) by proposing TC-GNN, a GPU acceleration framework that bridges sparse GNN computation with dense tensor cores, achieving an average 1.70x speedup over the state-of-the-art DGL framework.
Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate TC-GNN with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70X speedup over the state-of-the-art DGL framework across various models and datasets.