CVApr 11, 2025

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Mohamed Dhouib, Davide Buscaldi, Sonia Vanier, Aymen Shabou

arXiv:2504.08966v129.541 citationsh-index: 11Has CodeCVPR

Originality Incremental advance

AI Analysis

This addresses the efficiency bottleneck for users deploying Visual Language Models in resource-constrained environments, though it is incremental as it builds on existing token reduction techniques.

The paper tackles the problem of high computational cost in Visual Language Models due to redundant visual tokens by introducing PACT, a method that prunes irrelevant tokens and merges redundant ones, reducing inference time by up to 30% and memory usage by 25% on standard benchmarks.

Visual Language Models require substantial computational resources for inference due to the additional input tokens needed to represent visual information. However, these visual tokens often contain redundant and unimportant information, resulting in an unnecessarily high number of tokens. To address this, we introduce PACT, a method that reduces inference time and memory usage by pruning irrelevant tokens and merging visually redundant ones at an early layer of the language model. Our approach uses a novel importance metric to identify unimportant tokens without relying on attention scores, making it compatible with FlashAttention. We also propose a novel clustering algorithm, called Distance Bounded Density Peak Clustering, which efficiently clusters visual tokens while constraining the distances between elements within a cluster by a predefined threshold. We demonstrate the effectiveness of PACT through extensive experiments.

View on arXiv PDF Code

Similar