CVSep 25, 2025

SwinMamba: A hybrid local-global mamba framework for enhancing semantic segmentation of remotely sensed images

arXiv:2509.20918v13.6h-index: 5

Originality Incremental advance

AI Analysis

This work addresses semantic segmentation for remote sensing applications like land use classification, offering an incremental improvement over existing methods.

The paper tackled the problem of semantic segmentation in remote sensing imagery, which is challenged by high resolution and complex scenes, by proposing SwinMamba, a hybrid local-global framework that outperformed state-of-the-art methods on datasets like LoveDA and ISPRS Potsdam.

Semantic segmentation of remote sensing imagery is a fundamental task in computer vision, supporting a wide range of applications such as land use classification, urban planning, and environmental monitoring. However, this task is often challenged by the high spatial resolution, complex scene structures, and diverse object scales present in remote sensing data. To address these challenges, various deep learning architectures have been proposed, including convolutional neural networks, Vision Transformers, and the recently introduced Vision Mamba. Vision Mamba features a global receptive field and low computational complexity, demonstrating both efficiency and effectiveness in image segmentation. However, its reliance on global scanning tends to overlook critical local features, such as textures and edges, which are essential for achieving accurate segmentation in remote sensing contexts. To tackle this limitation, we propose SwinMamba, a novel framework inspired by the Swin Transformer. SwinMamba integrates localized Mamba-style scanning within shifted windows with a global receptive field, to enhance the model's perception of both local and global features. Specifically, the first two stages of SwinMamba perform local scanning to capture fine-grained details, while its subsequent two stages leverage global scanning to fuse broader contextual information. In our model, the use of overlapping shifted windows enhances inter-region information exchange, facilitating more robust feature integration across the entire image. Extensive experiments on the LoveDA and ISPRS Potsdam datasets demonstrate that SwinMamba outperforms state-of-the-art methods, underscoring its effectiveness and potential as a superior solution for semantic segmentation of remotely sensed imagery.

View on arXiv PDF

Similar