LGCVJun 30, 2021

Improving the Efficiency of Transformers for Resource-Constrained Devices

arXiv:2106.16006v131 citations
AI Analysis

This addresses the problem of deploying large Transformer models on mobile and low-power devices, representing an incremental improvement through parameter clustering.

The paper tackles the inefficiency of Transformers on resource-constrained devices by clustering model parameters, resulting in over 4x reduction in data transfer, up to 22% speedup, and 39% energy savings with less than 0.1% accuracy loss.

Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performance-limiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformers on several devices. We propose to reduce the overall memory footprint and memory transfers by clustering the model parameters. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x, achieve up to 22% speedup and 39% energy savings on mobile devices with less than 0.1% accuracy loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes