LGCLJun 20, 2023

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

arXiv:2306.11426v18 citationsh-index: 25
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of integrating high-accuracy Transformer models into mobile applications, but it is incremental as it focuses on evaluation rather than proposing new solutions.

The paper tackled the challenge of deploying Transformer models on mobile devices by benchmarking their performance, finding that they are not accelerator-friendly and require optimizations for efficient on-device execution.

Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement. At the same time, the ever-increasing use of mobile devices (MDs) has resulted in a surge of DNN-based mobile applications. Although traditional architectures, like CNNs and RNNs, have been successfully integrated into MDs, this is not the case for Transformers, a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. In this work, we aim to make steps towards bridging this gap by examining the current state of Transformers' on-device execution. To this end, we construct a benchmark of representative models and thoroughly evaluate their performance across MDs with different computational capabilities. Our experimental results show that Transformers are not accelerator-friendly and indicate the need for software and hardware optimisations to achieve efficient deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes