CVMay 20, 2024

Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification

arXiv:2405.12003v4116 citationsh-index: 6Neurocomputing
Originality Incremental advance
AI Analysis

This work addresses hyperspectral image classification for remote sensing applications, presenting a novel method that improves upon existing approaches but is incremental in adapting state space models to this domain.

The paper tackled hyperspectral image classification by introducing the Mamba-in-Mamba architecture, which uses a centralized Mamba-Cross-Scan mechanism and tokenized Mamba encoder to address inefficiencies in existing sequential models, achieving superior performance over state-of-the-art methods on three public datasets.

Hyperspectral image (HSI) classification is pivotal in the remote sensing (RS) field, particularly with the advancement of deep learning techniques. Sequential models, adapted from the natural language processing (NLP) field such as Recurrent Neural Networks (RNNs) and Transformers, have been tailored to this task, offering a unique viewpoint. However, several challenges persist 1) RNNs struggle with centric feature aggregation and are sensitive to interfering pixels, 2) Transformers require significant computational resources and often underperform with limited HSI training samples, and 3) Current scanning methods for converting images into sequence-data are simplistic and inefficient. In response, this study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task. The MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), a Semantic Token Learner (STL), and a Semantic Token Fuser (STF) for enhanced feature generation and concentration, and 3) A Weighted MCS Fusion (WMF) module coupled with a Multi-Scale Loss Design to improve decoding efficiency. Experimental results from three public HSI datasets with fixed and disjoint training-testing samples demonstrate that our method outperforms existing baselines and state-of-the-art approaches, highlighting its efficacy and potential in HSI applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes