GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching
This work solves the problem of inefficient GNN inference for real-world applications with large, sparse graphs, offering significant performance improvements over existing hardware.
The paper tackles the challenge of accelerating graph neural network (GNN) inference by proposing GNNIE, an accelerator that addresses workload imbalance and memory access issues, achieving speedups of up to 21233x over CPUs and 699x over GPUs on various GNN models and datasets.
Graph neural networks (GNN) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to (a) host a variety of GNNs, (b) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns, and (c) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This paper proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by (i)~splitting vertex feature operands into blocks, (ii)~reordering and redistributing computations, (iii)~using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 21233x over a CPU and 699x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 35x over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv, and, using 3.4x fewer processing units, an average speedup of 2.1x over AWB-GCN (which runs only GCNs).