CVApr 25, 2025

Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning

Yuanbing Ouyang, Yizhuo Liang, Qingpeng Li, Xinfei Guo, Yiming Luo, Di Wu, Hao Wang, Yushan Pan

arXiv:2504.17996v13.6h-index: 5J syst archit

Originality Incremental advance

AI Analysis

This work addresses the problem of deploying Vision Transformers on resource-constrained devices, offering an incremental improvement over existing token pruning methods.

The paper tackles the high computational cost of Vision Transformers for semantic segmentation by introducing LVTP, a progressive token pruning framework guided by low-level visual features and multi-scale Tsallis entropy, achieving 20%-45% computational reductions with negligible performance loss.

Vision Transformers (ViTs) excel in semantic segmentation but demand significant computation, posing challenges for deployment on resource-constrained devices. Existing token pruning methods often overlook fundamental visual data characteristics. This study introduces 'LVTP', a progressive token pruning framework guided by multi-scale Tsallis entropy and low-level visual features with twice clustering. It integrates high-level semantics and basic visual attributes for precise segmentation. A novel dynamic scoring mechanism using multi-scale Tsallis entropy weighting overcomes limitations of traditional single-parameter entropy. The framework also incorporates low-level feature analysis to preserve critical edge information while optimizing computational cost. As a plug-and-play module, it requires no architectural changes or additional training. Evaluations across multiple datasets show 20%-45% computational reductions with negligible performance loss, outperforming existing methods in balancing cost and accuracy, especially in complex edge regions.

View on arXiv PDF

Similar