CVMay 12

Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

arXiv:2605.1193427.2
AI Analysis

For researchers in depth super-resolution, this work offers an efficient alternative to quadratic-complexity attention mechanisms, enabling dense semantic interactions between modalities.

This paper tackles guided depth super-resolution (GDSR) by proposing an Interactive State Space Model with cross-modal local scanning, achieving competitive performance against state-of-the-art methods while maintaining linear complexity.

Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves global modeling with linear complexity. Furthermore, a cross-modal matching transform module is introduced to enhance interactive modeling quality by utilizing representative features from both modalities. Extensive experiments demonstrate competitive performance against state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes