AR CV LGFeb 17, 2023

ViTA: A Vision Transformer Inference Accelerator for Edge Applications

Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, Peter A. Beerel

arXiv:2302.09108v19.250 citationsh-index: 34

Originality Incremental advance

AI Analysis

This addresses the problem of efficient inference for vision transformers in edge computing, representing an incremental improvement by adapting existing accelerator concepts to a new domain.

The paper tackles the challenge of deploying compute-heavy vision transformer models on resource-constrained edge devices by proposing ViTA, a configurable hardware accelerator that achieves nearly 90% hardware utilization efficiency, 0.88W power consumption at 150 MHz, and reasonable frame rates.

Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks due to their ability to capture the global relation between features which leads to superior performance. However, they are compute-heavy and difficult to deploy in resource-constrained edge devices. Existing hardware accelerators, including those for the closely-related BERT transformer models, do not target highly resource-constrained environments. In this paper, we address this gap and propose ViTA - a configurable hardware accelerator for inference of vision transformer models, targeting resource-constrained edge computing devices and avoiding repeated off-chip memory accesses. We employ a head-level pipeline and inter-layer MLP optimizations, and can support several commonly used vision transformer models with changes solely in our control logic. We achieve nearly 90% hardware utilization efficiency on most vision transformer models, report a power of 0.88W when synthesised with a clock of 150 MHz, and get reasonable frame rates - all of which makes ViTA suitable for edge applications.

View on arXiv PDF

Similar