CVOct 28, 2025

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

arXiv:2510.24214v110 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in MLLMs for vision-language tasks, offering an incremental improvement over existing token pruning methods.

The paper tackles the problem of computational overhead in Multimodal Large Language Models (MLLMs) due to redundant visual tokens by proposing SCOPE, a token pruning strategy that jointly models saliency and coverage to preserve semantic completeness, resulting in consistent outperformance over prior approaches on multiple vision-language benchmarks.

Multimodal Large Language Models (MLLMs) typically process a large number of visual tokens, leading to considerable computational overhead, even though many of these tokens are redundant. Existing visual token pruning methods primarily focus on selecting the most salient tokens based on attention scores, resulting in the semantic incompleteness of the selected tokens. In this paper, we propose a novel visual token pruning strategy, called \textbf{S}aliency-\textbf{C}overage \textbf{O}riented token \textbf{P}runing for \textbf{E}fficient MLLMs (SCOPE), to jointly model both the saliency and coverage of the selected visual tokens to better preserve semantic completeness. Specifically, we introduce a set-coverage for a given set of selected tokens, computed based on the token relationships. We then define a token-coverage gain for each unselected token, quantifying how much additional coverage would be obtained by including it. By integrating the saliency score into the token-coverage gain, we propose our SCOPE score and iteratively select the token with the highest SCOPE score. We conduct extensive experiments on multiple vision-language understanding benchmarks using the LLaVA-1.5 and LLaVA-Next models. Experimental results demonstrate that our method consistently outperforms prior approaches. Our code is available at \href{https://github.com/kinredon/SCOPE}{https://github.com/kinredon/SCOPE}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes