CVOct 3, 2023

PPT: Token Pruning and Pooling for Efficient Vision Transformers

arXiv:2310.01812v341 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency barriers for deploying Vision Transformers in real-world applications, representing an incremental improvement over existing token reduction methods.

The paper tackles the high computational complexity of Vision Transformers by proposing PPT, a framework that adaptively combines token pruning and pooling to reduce redundant tokens, achieving over 37% FLOPs reduction and over 45% throughput improvement for DeiT-S on ImageNet with no accuracy drop.

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their practical applications in real-world scenarios. Motivated by the fact that not all tokens contribute equally to the final predictions and fewer tokens bring less computational cost, reducing redundant tokens has become a prevailing paradigm for accelerating vision transformers. However, we argue that it is not optimal to either only reduce inattentive redundancy by token pruning, or only reduce duplicative redundancy by token merging. To this end, in this paper we propose a novel acceleration framework, namely token Pruning & Pooling Transformers (PPT), to adaptively tackle these two types of redundancy in different layers. By heuristically integrating both token pruning and token pooling techniques in ViTs without additional trainable parameters, PPT effectively reduces the model complexity while maintaining its predictive accuracy. For example, PPT reduces over 37% FLOPs and improves the throughput by over 45% for DeiT-S without any accuracy drop on the ImageNet dataset. The code is available at https://github.com/xjwu1024/PPT and https://github.com/mindspore-lab/models/

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes