CVAIMay 12

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

arXiv:2605.1156347.0
Predicted impact top 72% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For vision tasks using state space models, TCP-SSM offers an efficient and interpretable alternative to existing SSM variants, though improvements are incremental over strong baselines.

TCP-SSM introduces token-conditioned poles to make state space model recurrence explicit and interpretable, reducing SSM computation by up to 44% in Vision Mamba-style models while maintaining or surpassing accuracy on image classification, segmentation, and detection.

State Space Models (SSMs) have emerged as a compelling alternative to attention models for long-range vision tasks, offering input-dependent recurrence with linear complexity. However, most efficient SSM variants reduce computation cost by modifying scan routes, resolutions, or traversal patterns, while largely leaving the recurrent dynamics implicit. Consequently, the model's state-dependent memory behavior is difficult to control, particularly in compact backbones where long scan paths can exceed the effective memory horizon. We propose Token-Conditioned Poles SSM (TCP-SSM), a structured selective SSM framework that improves efficiency while making recurrence dynamics explicit and interpretable through stable poles. TCP-SSM builds each scan operator with 1) real poles that model monotone or sign-alternating decay, and 2) complex-conjugate poles that capture damped oscillatory responses. Using bounded radius and angle modulation, TCP-SSM converts shared base poles into token-dependent poles, allowing each scan step to adapt its memory behavior to the current visual token while preserving pole stability. For practical scalability, we integrate grouped pole sharing with a lightweight low-rank input pathway, yielding an efficient scan operator that preserves linear-time scan complexity. Across image classification, semantic segmentation, and object detection, TCP-SSM reduces SSM computation complexity up to 44% in Vision Mamba-style models while maintaining or surpassing baseline accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes