AR LGAug 5, 2024

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Philip Wiese, Gamze İslamoğlu, Moritz Scherer, Luka Macan, Victor J. B. Jung, Alessio Burrello, Francesco Conti, Luca Benini

arXiv:2408.02473v22.38 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This enables efficient Transformer inference for resource-constrained edge devices, representing an incremental advance in TinyML hardware and deployment.

The paper tackles the challenge of deploying Transformer models in TinyML by introducing a heterogeneous architecture with RISC-V processors and hardwired accelerators, achieving 2960 GOp/J energy efficiency and 154 GOp/s throughput for 8-bit inference.

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22 nm FD-SOI technology).

View on arXiv PDF Code

Similar