CVCLLGJan 31, 2023

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

arXiv:2301.13741v364 citationsh-index: 21Has Code
Originality Incremental advance
AI Analysis

This addresses the need for efficient multimodal AI models, though it is incremental as it builds on existing pruning techniques.

The paper tackles the problem of compressing vision-language Transformers by proposing UPop, a framework that unifies and progressively prunes multimodal subnets, achieving higher compression ratios across various tasks and datasets.

Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, \textit{e}.\textit{g}., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (\textbf{\emph{UPop}}) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on various tasks, datasets, and model architectures demonstrate the effectiveness and versatility of the proposed UPop framework. The code is available at https://github.com/sdc17/UPop.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes