ARSIApr 13

GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming

arXiv:2604.153615.9h-index: 5
Predicted impact top 51% in AR · last 90 daysOriginality Highly original
AI Analysis

This work addresses the challenge of efficiently accelerating diverse graph-based dynamic programming workloads, which is important for genomics and network analytics applications.

GEN-Graph is a heterogeneous PIM accelerator that combines matrix and traversal tiles to efficiently support both compute-bound and memory-bound graph-based dynamic programming. It achieves up to 42.8x speedup and 392x energy efficiency over NVIDIA H100 GPU for all-pairs shortest path, and up to 2.56x throughput improvement over state-of-the-art accelerators for sequence-to-graph alignment.

While graph-based dynamic programming (DP) is a cornerstone of genomics and network analytics, its efficiency is hampered by fundamentally conflicting computational patterns. Matrix-centric DP drives regular, compute-bound network analytics, while topology-centric DP handles irregular, memory-bound genomic traversals. These two categories of DP have substantially different computation patterns and dataflows, which makes it difficult for a single homogeneous processing-in-memory (PIM) architecture to efficiently support both. This work presents GEN-Graph, a novel heterogeneous PIM chiplet that integrates two types of specialized compute tiles within a 2.5D package: Matrix-tile, a processing-using-memory (PUM) tile optimized for matrix-centric workloads, such as all-pairs shortest path (APSP); and traversal-tile, a processing-near-memory (PNM) tile optimized for traversal-centric DP workloads, such as DNA sequence alignment. Our hardware-software co-design employs recursive partitioning and reconfigurable windowed bit-parallel logic to ensure exact computation. Results show the matrix tile achieves 42.8x speedup and 392x energy efficiency over the NVIDIA H100 GPU for APSP. For sequence-to-graph alignment, the traversal tile sustains 2.56 million reads/s (short-reads) and 39.3 thousand reads/s (long-reads), outperforming state-of-the-art accelerators by up to 2.56x in throughput. GEN-Graph provides the first scalable, exact solution for general DP dataflows by matching hardware specialization to algorithmic structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes