ARAIJan 22, 2024

BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge

arXiv:2401.11851v210 citationsh-index: 10ISCAS
Originality Incremental advance
AI Analysis

This work addresses energy-efficient deployment of binary Transformers for edge computing, representing an incremental improvement over existing accelerators.

The paper tackled the challenge of deploying binary Transformers at the edge by addressing inefficient quantized matrix multiplication and energy overhead from multi-precision activations, resulting in BETA, an accelerator that achieved an average energy efficiency of 174 GOPS/W, up to 21.92x higher than prior FPGA-based accelerators.

Existing binary Transformers are promising in edge deployment due to their compact model size, low computational complexity, and considerable inference accuracy. However, deploying binary Transformers faces challenges on prior processors due to inefficient execution of quantized matrix multiplication (QMM) and the energy consumption overhead caused by multi-precision activations. To tackle the challenges above, we first develop a computation flow abstraction method for binary Transformers to improve QMM execution efficiency by optimizing the computation order. Furthermore, a binarized energy-efficient Transformer accelerator, namely BETA, is proposed to boost the efficient deployment at the edge. Notably, BETA features a configurable QMM engine, accommodating diverse activation precisions of binary Transformers and offering high-parallelism and high-speed for QMMs with impressive energy efficiency. Experimental results evaluated on ZCU102 FPGA show BETA achieves an average energy efficiency of 174 GOPS/W, which is 1.76~21.92x higher than prior FPGA-based accelerators, showing BETA's good potential for edge Transformer acceleration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes