CVMay 20, 2024
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image ClassificationWeilian Zhou, Sei-Ichiro Kamata, Haipeng Wang et al.
Hyperspectral image (HSI) classification is pivotal in the remote sensing (RS) field, particularly with the advancement of deep learning techniques. Sequential models, adapted from the natural language processing (NLP) field such as Recurrent Neural Networks (RNNs) and Transformers, have been tailored to this task, offering a unique viewpoint. However, several challenges persist 1) RNNs struggle with centric feature aggregation and are sensitive to interfering pixels, 2) Transformers require significant computational resources and often underperform with limited HSI training samples, and 3) Current scanning methods for converting images into sequence-data are simplistic and inefficient. In response, this study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task. The MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), a Semantic Token Learner (STL), and a Semantic Token Fuser (STF) for enhanced feature generation and concentration, and 3) A Weighted MCS Fusion (WMF) module coupled with a Multi-Scale Loss Design to improve decoding efficiency. Experimental results from three public HSI datasets with fixed and disjoint training-testing samples demonstrate that our method outperforms existing baselines and state-of-the-art approaches, highlighting its efficacy and potential in HSI applications.
CVFeb 18, 2025
When Segmentation Meets Hyperspectral Image: New Paradigm for Hyperspectral Image ClassificationWeilian Zhou, Weixuan Xie, Sei-ichiro Kamata et al.
Hyperspectral image (HSI) classification is a cornerstone of remote sensing, enabling precise material and land-cover identification through rich spectral information. While deep learning has driven significant progress in this task, small patch-based classifiers, which account for over 90% of the progress, face limitations: (1) the small patch (e.g., 7x7, 9x9)-based sampling approach considers a limited receptive field, resulting in insufficient spatial structural information critical for object-level identification and noise-like misclassifications even within uniform regions; (2) undefined optimal patch sizes lead to coarse label predictions, which degrade performance; and (3) a lack of multi-shape awareness around objects. To address these challenges, we draw inspiration from large-scale image segmentation techniques, which excel at handling object boundaries-a capability essential for semantic labeling in HSI classification. However, their application remains under-explored in this task due to (1) the prevailing notion that larger patch sizes degrade performance, (2) the extensive unlabeled regions in HSI groundtruth, and (3) the misalignment of input shapes between HSI data and segmentation models. Thus, in this study, we propose a novel paradigm and baseline, HSIseg, for HSI classification that leverages segmentation techniques combined with a novel Dynamic Shifted Regional Transformer (DSRT) to overcome these challenges. We also introduce an intuitive progressive learning framework with adaptive pseudo-labeling to iteratively incorporate unlabeled regions into the training process, thereby advancing the application of segmentation techniques. Additionally, we incorporate auxiliary data through multi-source data collaboration, promoting better feature interaction. Validated on five public HSI datasets, our proposal outperforms state-of-the-art methods.
CVMar 28, 2021
Rethinking ResNets: Improved Stacking Strategies With High Order SchemesZhengbo Luo, Zitang Sun, Weilian Zhou et al.
Various deep neural network architectures (DNNs) maintain massive vital records in computer vision. While drawing attention worldwide, the design of the overall structure lacks general guidance. Based on the relationship between DNN design and numerical differential equations, we performed a fair comparison of the residual design with higher-order perspectives. We show that the widely used DNN design strategy, constantly stacking a small design (usually 2-3 layers), could be easily improved, supported by solid theoretical knowledge and with no extra parameters needed. We reorganise the residual design in higher-order ways, which is inspired by the observation that many effective networks can be interpreted as different numerical discretisations of differential equations. The design of ResNet follows a relatively simple scheme, which is Euler forward; however, the situation becomes complicated rapidly while stacking. We suppose that stacked ResNet is somehow equalled to a higher-order scheme; then, the current method of forwarding propagation might be relatively weak compared with a typical high-order method such as Runge-Kutta. We propose HO-ResNet to verify the hypothesis of widely used CV benchmarks with sufficient experiments. Stable and noticeable increases in performance are observed, and convergence and robustness are also improved. Our stacking strategy improved ResNet-30 by 2.15 per cent and ResNet-58 by 2.35 per cent on CIFAR-10, with the same settings and parameters. The proposed strategy is fundamental and theoretical and can therefore be applied to any network as a general guideline.
CVSep 28, 2015
Fast Non-local Stereo Matching based on Hierarchical Disparity PredictionXuan Luo, Xuejiao Bai, Shuo Li et al.
Stereo matching is the key step in estimating depth from two or more images. Recently, some tree-based non-local stereo matching methods have been proposed, which achieved state-of-the-art performance. The algorithms employed some tree structures to aggregate cost and thus improved the performance and reduced the coputation load of the stereo matching. However, the computational complexity of these tree-based algorithms is still high because they search over the entire disparity range. In addition, the extreme greediness of the minimum spanning tree (MST) causes the poor performance in large areas with similar colors but varying disparities. In this paper, we propose an efficient stereo matching method using a hierarchical disparity prediction (HDP) framework to dramatically reduce the disparity search range so as to speed up the tree-based non-local stereo methods. Our disparity prediction scheme works on a graph pyramid derived from an image whose disparity to be estimated. We utilize the disparity of a upper graph to predict a small disparity range for the lower graph. Some independent disparity trees (DT) are generated to form a disparity prediction forest (HDPF) over which the cost aggregation is made. When combined with the state-of-the-art tree-based methods, our scheme not only dramatically speeds up the original methods but also improves their performance by alleviating the second drawback of the tree-based methods. This is partially because our DTs overcome the extreme greediness of the MST. Extensive experimental results on some benchmark datasets demonstrate the effectiveness and efficiency of our framework. For example, the segment-tree based stereo matching becomes about 25.57 times faster and 2.2% more accurate over the Middlebury 2006 full-size dataset.