DC LGAug 23, 2019

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, Martin Herbordt

arXiv:1908.10834v1024.435 citations

Originality Incremental advance

AI Analysis

This work addresses performance bottlenecks in hardware acceleration for GCNs, which are critical for applications involving graph data, but it is incremental as it builds on existing GCN acceleration methods.

The paper tackles the problem of workload imbalance in accelerating Graph Convolutional Network (GCN) inference for large, unbalanced real-world graphs by proposing AWB-GCN, an accelerator with runtime workload rebalancing, achieving up to 7.7x higher processing element utilization and speedups of 3255x over CPUs, 80.3x over GPUs, and 5.1x over a prior accelerator.

Deep learning systems have been successfully applied to Euclidean data such as images, video, and audio. In many applications, however, information and their relationships are better expressed with graphs. Graph Convolutional Networks (GCNs) appear to be a promising approach to efficiently learn from graph data structures, having shown advantages in many critical applications. As with other deep learning modalities, hardware acceleration is critical. The challenge is that real-world graphs are often extremely large and unbalanced; this poses significant performance demands and design challenges. In this paper, we propose Autotuning-Workload-Balancing GCN (AWB-GCN) to accelerate GCN inference. To address the issue of workload imbalance in processing real-world graphs, three hardware-based autotuning techniques are proposed: dynamic distribution smoothing, remote switching, and row remapping. In particular, AWB-GCN continuously monitors the sparse graph pattern, dynamically adjusts the workload distribution among a large number of processing elements (up to 4K PEs), and, after converging, reuses the ideal configuration. Evaluation is performed using an Intel D5005 FPGA with five commonly-used datasets. Results show that 4K-PE AWB-GCN can significantly elevate PE utilization by 7.7x on average and demonstrate considerable performance speedups over CPUs (3255x), GPUs (80.3x), and a prior GCN accelerator (5.1x).

View on arXiv PDF

Similar