CVApr 25, 2025

Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning

arXiv:2504.17996v1h-index: 5J syst archit
Originality Incremental advance
AI Analysis

This work addresses the problem of deploying Vision Transformers on resource-constrained devices, offering an incremental improvement over existing token pruning methods.

The paper tackles the high computational cost of Vision Transformers for semantic segmentation by introducing LVTP, a progressive token pruning framework guided by low-level visual features and multi-scale Tsallis entropy, achieving 20%-45% computational reductions with negligible performance loss.

Vision Transformers (ViTs) excel in semantic segmentation but demand significant computation, posing challenges for deployment on resource-constrained devices. Existing token pruning methods often overlook fundamental visual data characteristics. This study introduces 'LVTP', a progressive token pruning framework guided by multi-scale Tsallis entropy and low-level visual features with twice clustering. It integrates high-level semantics and basic visual attributes for precise segmentation. A novel dynamic scoring mechanism using multi-scale Tsallis entropy weighting overcomes limitations of traditional single-parameter entropy. The framework also incorporates low-level feature analysis to preserve critical edge information while optimizing computational cost. As a plug-and-play module, it requires no architectural changes or additional training. Evaluations across multiple datasets show 20%-45% computational reductions with negligible performance loss, outperforming existing methods in balancing cost and accuracy, especially in complex edge regions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes