HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator
This work addresses energy and performance bottlenecks in GNN training for AI hardware designers, though it is incremental as it builds on existing PIM and heterogeneous computing concepts.
The paper tackles the challenge of accelerating Graph Neural Network (GNN) training by proposing HePGA, a 3D heterogeneous Processing-in-Memory (PIM) accelerator that optimizes mapping across different PIM devices, resulting in up to 3.8x and 6.8x improvements in energy-efficiency and compute efficiency compared to existing PIM-based architectures without sacrificing accuracy.
Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and SRAM exist, with each device offering unique trade-offs in terms of power, latency, area, and non-idealities. A heterogeneous manycore architecture enabled by 3D integration can combine multiple PIM devices on a single platform, to enable energy-efficient and high-performance GNN training. In this work, we propose a 3D heterogeneous PIM-based accelerator for GNN training referred to as HePGA. We leverage the unique characteristics of GNN layers and associated computing kernels to optimize their mapping on to different PIM devices as well as planar tiers. Our experimental analysis shows that HePGA outperforms existing PIM-based architectures by up to 3.8x and 6.8x in energy-efficiency (TOPS/W) and compute efficiency (TOPS/mm2) respectively, without sacrificing the GNN prediction accuracy. Finally, we demonstrate the applicability of HePGA to accelerate inferencing of emerging transformer models.