CVJan 9, 2025

Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

arXiv:2501.05179v530 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses computational bottlenecks for users of high-resolution LVLMs, offering a significant speed-up with minimal performance loss, though it is incremental as it builds on existing token compression methods.

The paper tackles the efficiency challenge of high-resolution large vision-language models by proposing a plug-and-play token compression framework that maintains over 90% performance while compressing 90% visual tokens, reducing FLOPs to 9.1% and peak memory to 60%.

Large vision-language models (LVLMs) excel at visual understanding, but face efficiency challenges due to quadratic complexity in processing long multi-modal contexts. While token compression can reduce computational costs, existing approaches are designed for single-view LVLMs and fail to consider the unique multi-view characteristics of high-resolution LVLMs with dynamic cropping. Existing methods treat all tokens uniformly, but our analysis reveals that global thumbnails can naturally guide the compression of local crops by providing holistic context for informativeness evaluation. In this paper, we first analyze dynamic cropping strategy, revealing both the complementary nature between thumbnails and crops, and the distinctive characteristics across different crops. Based on our observations, we propose "Global Compression Commander" (GlobalCom$^2$), a novel plug-and-play token compression framework for HR-LVLMs. GlobalCom$^2$ leverages thumbnail as the "commander" to guide the compression of local crops, adaptively preserving informative details while eliminating redundancy. Extensive experiments show that GlobalCom$^2$ maintains over 90% performance while compressing 90% visual tokens, reducing FLOPs and peak memory to 9.1% and 60%. Our code is available at https://github.com/xuyang-liu16/GlobalCom2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes