CVROJun 22, 2025

Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation

arXiv:2506.17869v13 citationsh-index: 39Has CodeIROS
Originality Highly original
AI Analysis

This work addresses the critical limitation of computational efficiency for resource-constrained field robots in wild environments, representing a strong specific gain in this domain.

The paper tackled the problem of high computational overhead in RGB-thermal semantic segmentation for field robots by introducing CM-SSM, an efficient architecture using cross-modal state space modeling, which achieved state-of-the-art performance on the CART dataset with linear computational complexity and fewer parameters.

The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-thermal semantic segmentation architecture leveraging a cross-modal state space modeling (SSM) approach. Our framework comprises two key components. First, we introduced a cross-modal 2D-selective-scan (CM-SS2D) module to establish SSM between RGB and thermal modalities, which constructs cross-modal visual sequences and derives hidden state representations of one modality from the other. Second, we developed a cross-modal state space association (CM-SSA) module that effectively integrates global associations from CM-SS2D with local spatial features extracted through convolutional operations. In contrast with Transformer-based approaches, CM-SSM achieves linear computational complexity with respect to image resolution. Experimental results show that CM-SSM achieves state-of-the-art performance on the CART dataset with fewer parameters and lower computational cost. Further experiments on the PST900 dataset demonstrate its generalizability. Codes are available at https://github.com/xiaodonguo/CMSSM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes