MVNet: Hyperspectral Remote Sensing Image Classification Based on Hybrid Mamba-Transformer Vision Backbone Architecture
This work addresses hyperspectral image classification for remote sensing applications, presenting an incremental improvement by combining existing methods into a novel hybrid architecture.
The paper tackled hyperspectral image classification challenges like high-dimensional data and limited samples by proposing MVNet, a hybrid Mamba-Transformer architecture that integrates 3D-CNN, Transformer, and Mamba for efficient spatial-spectral feature extraction, achieving improved classification accuracy and computational efficiency on IN, UP, and KSC datasets.
Hyperspectral image (HSI) classification faces challenges such as high-dimensional data, limited training samples, and spectral redundancy, which often lead to overfitting and insufficient generalization capability. This paper proposes a novel MVNet network architecture that integrates 3D-CNN's local feature extraction, Transformer's global modeling, and Mamba's linear complexity sequence modeling capabilities, achieving efficient spatial-spectral feature extraction and fusion. MVNet features a redesigned dual-branch Mamba module, including a State Space Model (SSM) branch and a non-SSM branch employing 1D convolution with SiLU activation, enhancing modeling of both short-range and long-range dependencies while reducing computational latency in traditional Mamba. The optimized HSI-MambaVision Mixer module overcomes the unidirectional limitation of causal convolution, capturing bidirectional spatial-spectral dependencies in a single forward pass through decoupled attention that focuses on high-value features, alleviating parameter redundancy and the curse of dimensionality. On IN, UP, and KSC datasets, MVNet outperforms mainstream hyperspectral image classification methods in both classification accuracy and computational efficiency, demonstrating robust capability in processing complex HSI data.