CVJul 16, 2024

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao, Bo Zheng

arXiv:2407.11717v117.317 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses efficiency bottlenecks for real-world deployment of VLMs, though it is incremental as it builds on existing acceleration methods by focusing on data redundancy.

The paper tackles the high computational cost of Vision-Language Large Models (VLMs) by introducing Turbo, a plug-in module that accelerates VLMs by pruning inefficient tokens based on information degree, achieving good acceleration with negligible performance drop on multiple benchmarks.

Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantization, but completely overlook the data-perspective redundancy. To fill the overlook, this paper pioneers the severity of data redundancy, and designs one plug-and-play Turbo module guided by information degree to prune inefficient tokens from visual or textual data. In pursuit of efficiency-performance trade-offs, information degree takes two crucial factors into consideration: mutual redundancy and semantic value. Concretely, the former evaluates data duplication between sequential tokens; while the latter evaluates each token by its contribution to the overall semantics. As a result, tokens with high information degree carry less redundancy and stronger semantics. For VLMs' calculation, Turbo works as a user-friendly plug-in that sorts data referring to information degree, utilizing only top-level ones to save costs. Its advantages are multifaceted, e.g., being generally compatible to various VLMs across understanding and generation, simple use without re-training and trivial engineering efforts. On multiple VLMs benchmarks, we fully experiment to demonstrate the good acceleration of Turbo, under negligible performance drop.

View on arXiv PDF

Similar