ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute
This work addresses the memory bottleneck in GPU computing for a broad range of deep learning and scientific workloads, offering significant performance and energy improvements.
The paper presents a unified near-memory GPU architecture that achieves 6-16x speedup and 6-13x energy savings across diverse workloads (CNNs, GCNs, linear programming, LLMs, Ising) compared to MIAOW GPU, with additional 1.5x and 1.6x energy savings from sparsity-aware and lightweight softmax circuits, respectively.
We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.