CVMar 14, 2025

Similarity-Aware Token Pruning: Your VLM but Faster

arXiv:2503.11549v120 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses efficiency problems for users of large vision and vision-language models by providing a practical, incremental improvement in inference speed without significant accuracy trade-offs.

The paper tackled the computational challenge of Vision Transformers and Vision-Language Models by proposing SAINT, a training-free token pruning framework that dynamically optimizes pruning rates, resulting in doubled throughput for ViT-H/14 with only 0.6% accuracy loss and reducing LLaVA-13B's tokens by 75% with less than 1% performance loss.

The computational demands of Vision Transformers (ViTs) and Vision-Language Models (VLMs) remain a significant challenge due to the quadratic complexity of self-attention. While token pruning offers a promising solution, existing methods often introduce training overhead or fail to adapt dynamically across layers. We present SAINT, a training-free token pruning framework that leverages token similarity and a graph-based formulation to dynamically optimize pruning rates and redundancy thresholds. Through systematic analysis, we identify a universal three-stage token evolution process (aligner-explorer-aggregator) in transformers, enabling aggressive pruning in early stages without sacrificing critical information. For ViTs, SAINT doubles the throughput of ViT-H/14 at 224px with only 0.6% accuracy loss on ImageNet-1K, surpassing the closest competitor by 0.8%. For VLMs, we apply SAINT in three modes: ViT-only, LLM-only, and hybrid. SAINT reduces LLaVA-13B's tokens by 75%, achieving latency comparable to LLaVA-7B with less than 1% performance loss across benchmarks. Our work establishes a unified, practical framework for efficient inference in ViTs and VLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes