CVOct 7, 2021
MSHCNet: Multi-Stream Hybridized Convolutional Networks with Mixed Statistics in Euclidean/Non-Euclidean Spaces and Its Application to Hyperspectral Image ClassificationShuang He, Haitong Tang, Xia Lu et al.
It is well known that hyperspectral images (HSI) contain rich spatial-spectral contextual information, and how to effectively combine both spectral and spatial information using DNN for HSI classification has become a new research hotspot. Compared with CNN with square kernels, GCN have exhibited exciting potential to model spatial contextual structure and conduct flexible convolution on arbitrarily irregular image regions. However, current GCN only using first-order spectral-spatial signatures can result in boundary blurring and isolated misclassification. To address these, we first designed the graph-based second-order pooling (GSOP) operation to obtain contextual nodes information in non-Euclidean space for GCN. Further, we proposed a novel multi-stream hybridized convolutional network (MSHCNet) with combination of first and second order statistics in Euclidean/non-Euclidean spaces to learn and fuse multi-view complementary information to segment HSIs. Specifically, our MSHCNet adopted four parallel streams, which contained G-stream, utilizing the irregular correlation between adjacent land covers in terms of first-order graph in non-Euclidean space; C-stream, adopting convolution operator to learn regular spatial-spectral features in Euclidean space; N-stream, combining first and second order features to learn representative and discriminative regular spatial-spectral features of Euclidean space; S-stream, using GSOP to capture boundary correlations and obtain graph representations from all nodes in graphs of non-Euclidean space. Besides, these feature representations learned from four different streams were fused to integrate the multi-view complementary information for HSI classification. Finally, we evaluated our proposed MSHCNet on three hyperspectral datasets, and experimental results demonstrated that our method significantly outperformed state-of-the-art eight methods.
CVSep 19, 2021
RSI-Net: Two-Stream Deep Neural Network for Remote Sensing Imagesbased Semantic SegmentationShuang He, Xia Lu, Jason Gu et al.
For semantic segmentation of remote sensing images (RSI), trade-off between representation power and location accuracy is quite important. How to get the trade-off effectively is an open question,where current approaches of utilizing very deep models result in complex models with large memory consumption. In contrast to previous work that utilizes dilated convolutions or deep models, we propose a novel two-stream deep neural network for semantic segmentation of RSI (RSI-Net) to obtain improved performance through modeling and propagating spatial contextual structure effectively and a decoding scheme with image-level and graph-level combination. The first component explicitly models correlations between adjacent land covers and conduct flexible convolution on arbitrarily irregular image regions by using graph convolutional network, while densely connected atrous convolution network (DenseAtrousCNet) with multi-scale atrous convolution can expand the receptive fields and obtain image global information. Extensive experiments are implemented on the Vaihingen, Potsdam and Gaofen RSI datasets, where the comparison results demonstrate the superior performance of RSI-Net in terms of overall accuracy (91.83%, 93.31% and 93.67% on three datasets, respectively), F1 score (90.3%, 91.49% and 89.35% on three datasets, respectively) and kappa coefficient (89.46%, 90.46% and 90.37% on three datasets, respectively) when compared with six state-of-the-art RSI semantic segmentation methods.
CVAug 1, 2021
CSC-Unet: A Novel Convolutional Sparse Coding Strategy Based Neural Network for Semantic SegmentationHaitong Tang, Shuang He, Mengduo Yang et al.
It is a challenging task to accurately perform semantic segmentation due to the complexity of real picture scenes. Many semantic segmentation methods based on traditional deep learning insufficiently captured the semantic and appearance information of images, which put limit on their generality and robustness for various application scenes. In this paper, we proposed a novel strategy that reformulated the popularly-used convolution operation to multi-layer convolutional sparse coding block to ease the aforementioned deficiency. This strategy can be possibly used to significantly improve the segmentation performance of any semantic segmentation model that involves convolutional operations. To prove the effectiveness of our idea, we chose the widely-used U-Net model for the demonstration purpose, and we designed CSC-Unet model series based on U-Net. Through extensive analysis and experiments, we provided credible evidence showing that the multi-layer convolutional sparse coding block enables semantic segmentation model to converge faster, can extract finer semantic and appearance information of images, and improve the ability to recover spatial detail information. The best CSC-Unet model significantly outperforms the results of the original U-Net on three public datasets with different scenarios, i.e., 87.14% vs. 84.71% on DeepCrack dataset, 68.91% vs. 67.09% on Nuclei dataset, and 53.68% vs. 48.82% on CamVid dataset, respectively.