CVLGMMJun 10, 2025

Diversity-Guided MLP Reduction for Efficient Large Vision Transformers

arXiv:2506.08591v2h-index: 21Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency issues for users of large vision transformers, offering a near-lossless compression method that is incremental but highly effective.

The paper tackles the high computational and memory costs of large vision transformers by proposing a method to reduce parameters and FLOPs with minimal performance loss, achieving over 57.0% reduction in parameters and FLOPs and up to 71.5% for specific models like EVA-CLIP-E.

Transformer models achieve excellent scaling property, where the performance is improved with the increment of model capacity. However, large-scale model parameters lead to an unaffordable cost of computing and memory. We analyze popular transformer architectures and find that multilayer perceptron (MLP) modules take up the majority of model parameters. To this end, we focus on the recoverability of the compressed models and propose a Diversity-Guided MLP Reduction (DGMR) method to significantly reduce the parameters of large vision transformers with only negligible performance degradation. Specifically, we conduct a Gram-Schmidt weight pruning strategy to eliminate redundant neurons of MLP hidden layer, while preserving weight diversity for better performance recover during distillation. Compared to the model trained from scratch, our pruned model only requires 0.06\% data of LAION-2B (for the training of large vision transformers) without labels (ImageNet-1K) to recover the original performance. Experimental results on several state-of-the-art large vision transformers demonstrate that our method achieves a more than 57.0\% parameter and FLOPs reduction in a near lossless manner. Notably, for EVA-CLIP-E (4.4B), our method accomplishes a 71.5\% parameter and FLOPs reduction without performance degradation. The source code and trained weights are available at https://github.com/visresearch/DGMR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes