Vision Transformer Pruning
This work addresses the storage, memory, and computational bottlenecks for mobile deployment, but it is incremental as it applies existing pruning techniques to vision transformers.
The paper tackles the problem of deploying vision transformers to mobile devices by proposing a pruning approach that identifies and removes less important dimensions, achieving high pruning ratios without significantly compromising accuracy on ImageNet.
Vision transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment to mobile devices. Here we present a vision transformer pruning approach, which identifies the impacts of dimensions in each layer of transformer and then executes pruning accordingly. By encouraging dimension-wise sparsity in the transformer, important dimensions automatically emerge. A great number of dimensions with small importance scores can be discarded to achieve a high pruning ratio without significantly compromising accuracy. The pipeline for vision transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning dimensions of linear projections; 3) fine-tuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate the effectiveness of our proposed method.