ARApr 4

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

arXiv:2602.1426290.65 citationsh-index: 5
AI Analysis

This work addresses the memory bottleneck in GPU computing for a broad range of deep learning and scientific workloads, offering significant performance and energy improvements.

The paper presents a unified near-memory GPU architecture that achieves 6-16x speedup and 6-13x energy savings across diverse workloads (CNNs, GCNs, linear programming, LLMs, Ising) compared to MIAOW GPU, with additional 1.5x and 1.6x energy savings from sparsity-aware and lightweight softmax circuits, respectively.

We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes