AR AI LGJan 21, 2024

AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology

Rongqing Cong, Wenyang He, Mingxuan Li, Bangning Luo, Zebin Yang, Yuchao Yang, Ru Huang, Bonan Yan

arXiv:2401.11459v12.34 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in LLM hardware acceleration for AI researchers and engineers, though it is incremental as it builds on existing PIM and accelerator concepts.

The paper tackles the high I/O bandwidth demand of self-attention modules in Transformer-based LLMs by developing AttentionLego, a fully customized accelerator using Processing-In-Memory technology, achieving spatially scalable LLM processors with open-source implementation.

Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the most dominating sub-structure inside Transformer-based LLMs. Computation using general-purpose graphics processing units (GPUs) inflicts reckless demand for I/O bandwidth for transferring intermediate calculation results between memories and processing units. To tackle this challenge, this work develops a fully customized vanilla self-attention accelerator, AttentionLego, as the basic building block for constructing spatially expandable LLM processors. AttentionLego provides basic implementation with fully-customized digital logic incorporating Processing-In-Memory (PIM) technology. It is based on PIM-based matrix-vector multiplication and look-up table-based Softmax design. The open-source code is available online: https://bonany.cc/attentionleg.

View on arXiv PDF

Similar