LG ARDec 16, 2024

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization

arXiv:2412.12218v25 citationsh-index: 1

Originality Incremental advance

AI Analysis

It addresses performance bottlenecks in GNNs for domains like social networks and bioinformatics, though it is incremental as it builds on prior acceleration techniques.

This paper tackles the challenge of accelerating sparse graph neural networks (GNNs) by proposing FTC-GNN, a framework that optimizes GPU resource utilization through collaborative use of CUDA and Tensor Cores, achieving speedups of up to 7.10x compared to existing methods like DGL and PyG.

Graph neural networks (GNNs) have seen extensive application in domains such as social networks, bioinformatics, and recommendation systems. However, the irregularity and sparsity of graph data challenge traditional computing methods, which are insufficient to meet the performance demands of GNNs. Recent research has explored parallel acceleration using CUDA Cores and Tensor Cores, but significant challenges persist: (1) kernel fusion leads to false high utilization, failing to treat CUDA and Tensor Cores as independent resources, and (2) heterogeneous cores have distinct computation preferences, causing inefficiencies. To address these issues, this paper proposes FTC-GNN, a novel acceleration framework that efficiently utilizes CUDA and Tensor Cores for GNN computation. FTC-GNN introduces (1) a collaborative design that enables the parallel utilization of CUDA and Tensor Cores and (2) a sparse-to-dense transformation strategy that assigns dense matrix operations to Tensor Cores while leveraging CUDA Cores for data management and sparse edge processing. This design optimizes GPU resource utilization and improves computational efficiency. Experimental results demonstrate the effectiveness of FTC-GNN using GCN and AGNN models across various datasets. For GCN, FTC-GNN achieves speedups of 4.90x, 7.10x, and 1.17x compared to DGL, PyG, and TC-GNN, respectively. For AGNN, it achieves speedups of 5.32x, 2.92x, and 1.02x, establishing its superiority in accelerating GNN computations.

View on arXiv PDF

Similar