CVMay 29, 2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

arXiv:2305.17997v191 citationsHas Code
Originality Highly original
AI Analysis

This addresses the challenge of manual compression rate tuning in vision transformers for improved efficiency, though it is incremental as it builds on existing token compression techniques.

The paper tackles the problem of efficiently compressing tokens in vision transformers by proposing DiffRate, a method that automatically learns layer-wise compression rates, achieving a 40% FLOPs reduction and 1.5x throughput improvement with only a 0.16% accuracy drop on ImageNet without fine-tuning.

Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens. It is an important but challenging task. Although recent advanced approaches achieved great success, they need to carefully handcraft a compression rate (i.e. number of tokens to remove), which is tedious and leads to sub-optimal performance. To tackle this problem, we propose Differentiable Compression Rate (DiffRate), a novel token compression method that has several appealing properties prior arts do not have. First, DiffRate enables propagating the loss function's gradient onto the compression ratio, which is considered as a non-differentiable hyperparameter in previous work. In this case, different layers can automatically learn different compression rates layer-wisely without extra overhead. Second, token pruning and merging can be naturally performed simultaneously in DiffRate, while they were isolated in previous works. Third, extensive experiments demonstrate that DiffRate achieves state-of-the-art performance. For example, by applying the learned layer-wise compression rates to an off-the-shelf ViT-H (MAE) model, we achieve a 40% FLOPs reduction and a 1.5x throughput improvement, with a minor accuracy drop of 0.16% on ImageNet without fine-tuning, even outperforming previous methods with fine-tuning. Codes and models are available at https://github.com/OpenGVLab/DiffRate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes