ARSep 27, 2025

SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design

arXiv:2509.17072h-index: 10

Originality Highly original

AI Analysis

This work addresses the problem of high computation and memory costs in large language model inference for AI hardware designers, offering a novel co-optimization approach that is incremental over prior DSE methods.

The paper tackles the challenge of efficient sparse LLM accelerator design by proposing SnipSnap, a joint compression format and dataflow co-optimization framework, which achieves 18.24% average memory energy savings and speedups of 2248.3× and 21.0× over existing frameworks.

The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often overlook compression formats, a key factor for leveraging sparsity on accelerators. This paper proposes SnipSnap, a joint compression format and dataflow co-optimization framework for efficient sparse LLM accelerator design. SnipSnap introduces: (1) a hierarchical compression format encoding to expand the design space; (2) an adaptive compression engine for selecting formats under diverse sparsity; and (3) a progressive co-search workflow that jointly optimizes dataflow and compression formats. SnipSnap achieves 18.24% average memory energy savings via format optimization, along with 2248.3$\times$ and 21.0$\times$ speedups over Sparseloop and DiMO-Sparse frameworks, respectively.

View on arXiv PDF

Similar