CVDec 31, 2022Code
Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teachingJiaqing Zhang, Jie Lei, Weiying Xie et al.
Considering the computation complexity, we propose a Guided Hybrid Quantization with One-to-one Self-Teaching (GHOST}) framework. More concretely, we first design a structure called guided quantization self-distillation (GQSD), which is an innovative idea for realizing lightweight through the synergy of quantization and distillation. The training process of the quantization model is guided by its full-precision model, which is time-saving and cost-saving without preparing a huge pre-trained model in advance. Second, we put forward a hybrid quantization (HQ) module to obtain the optimal bit width automatically under a constrained condition where a threshold for distribution distance between the center and samples is applied in the weight value search space. Third, in order to improve information transformation, we propose a one-to-one self-teaching (OST) module to give the student network a ability of self-judgment. A switch control machine (SCM) builds a bridge between the student network and teacher network in the same location to help the teacher to reduce wrong guidance and impart vital knowledge to the student. This distillation method allows a model to learn from itself and gain substantial improvement without any additional supervision. Extensive experiments on a multimodal dataset (VEDAI) and single-modality datasets (DOTA, NWPU, and DIOR) show that object detection based on GHOST outperforms the existing detectors. The tiny parameters (<9.7 MB) and Bit-Operations (BOPs) (<2158 G) compared with any remote sensing-based, lightweight or distillation-based algorithms demonstrate the superiority in the lightweight design domain. Our code and model will be released at https://github.com/icey-zhang/GHOST.
CVNov 13, 2023
SpectralGPT: Spectral Remote Sensing Foundation ModelDanfeng Hong, Bing Zhang, Xuyang Li et al.
The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.
CVMay 20, 2022
Hyperspectral Unmixing Based on Nonnegative Matrix Factorization: A Comprehensive ReviewXin-Ru Feng, Heng-Chao Li, Rui Wang et al.
Hyperspectral unmixing has been an important technique that estimates a set of endmembers and their corresponding abundances from a hyperspectral image (HSI). Nonnegative matrix factorization (NMF) plays an increasingly significant role in solving this problem. In this article, we present a comprehensive survey of the NMF-based methods proposed for hyperspectral unmixing. Taking the NMF model as a baseline, we show how to improve NMF by utilizing the main properties of HSIs (e.g., spectral, spatial, and structural information). We categorize three important development directions including constrained NMF, structured NMF, and generalized NMF. Furthermore, several experiments are conducted to illustrate the effectiveness of associated algorithms. Finally, we conclude the article with possible future directions with the purposes of providing guidelines and inspiration to promote the development of hyperspectral unmixing.
IVApr 9, 2022
Dual-Stage Approach Toward Hyperspectral Image Super-ResolutionQiang Li, Yuan Yuan, Xiuping Jia et al.
Hyperspectral image produces high spectral resolution at the sacrifice of spatial resolution. Without reducing the spectral resolution, improving the resolution in the spatial domain is a very challenging problem. Motivated by the discovery that hyperspectral image exhibits high similarity between adjacent bands in a large spectral range, in this paper, we explore a new structure for hyperspectral image super-resolution (DualSR), leading to a dual-stage design, i.e., coarse stage and fine stage. In coarse stage, five bands with high similarity in a certain spectral range are divided into three groups, and the current band is guided to study the potential knowledge. Under the action of alternative spectral fusion mechanism, the coarse SR image is super-resolved in band-by-band. In order to build model from a global perspective, an enhanced back-projection method via spectral angle constraint is developed in fine stage to learn the content of spatial-spectral consistency, dramatically improving the performance gain. Extensive experiments demonstrate the effectiveness of the proposed coarse stage and fine stage. Besides, our network produces state-of-the-art results against existing works in terms of spatial reconstruction and spectral fidelity.
CVSep 13, 2023
Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and AdvancesXiangrong Zhang, Tianyang Zhang, Guanchun Wang et al.
Remote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received longstanding attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this review aims to present a comprehensive review of the recent achievements in deep learning based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD, as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.
CVJul 2, 2022
Pair-Relationship Modeling for Latent Fingerprint RecognitionYanming Zhu, Xuefei Yin, Xiuping Jia et al.
Latent fingerprints are important for identifying criminal suspects. However, recognizing a latent fingerprint in a collection of reference fingerprints remains a challenge. Most, if not all, of existing methods would extract representation features of each fingerprint independently and then compare the similarity of these representation features for recognition in a different process. Without the supervision of similarity for the feature extraction process, the extracted representation features are hard to optimally reflect the similarity of the two compared fingerprints which is the base for matching decision making. In this paper, we propose a new scheme that can model the pair-relationship of two fingerprints directly as the similarity feature for recognition. The pair-relationship is modeled by a hybrid deep network which can handle the difficulties of random sizes and corrupted areas of latent fingerprints. Experimental results on two databases show that the proposed method outperforms the state of the art.
CVApr 17
A B-Spline Function Based 3D Point Cloud Unwrapping Scheme for 3D Fingerprint Recognition and IdentificationMohammad Mogharen Askarin, Jiankun Hu, Min Wang et al.
Three-dimensional (3D) fingerprint recognition and identification offer several advantages over traditional two-dimensional (2D) recognition systems. The contactless nature of 3D fingerprints enhances hygiene and security, reducing the risk of contamination and spoofing. In addition to surface ridge and valley patterns, 3D fingerprints capture depth, curvature, and shape information, enabling the development of more precise and robust authentication systems. Despite recent advancements, significant challenges remain. The topological height of fingerprint pixels complicates the extraction of ridge and valley patterns. Furthermore, registration issues limit the acquisition process, requiring consistent direction and orientation across all samples. To address these challenges, this paper introduces a method that unwraps 3D fingerprints, represented as 3D point clouds, using B-spline curve fitting to mitigate height variation and reduce registration limitations. The unwrapped point cloud is then converted into a grayscale image by mapping the relative heights of the points. This grayscale image is subsequently used for recognition through conventional 2D fingerprint identification methods. The proposed approach demonstrated superior performance in 3D fingerprint recognition, achieving Equal Error Rates (EERs) of 0.2072%, 0.26%, and 0.22% across three experiments, outperforming existing methods. Additionally, the method surpassed 3D fingerprint flattening technique in both recognition and identification during cross-session experiments, achieving an EER of 1.50% when fingerprints with varying registrations were included.
CVJul 8, 2025
Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSSXinyu Wang, Muhammad Ibrahim, Haitian Wang et al.
Accurate geo-registration of LiDAR point clouds remains a significant challenge in urban environments where Global Navigation Satellite System (GNSS) signals are denied or degraded. Existing methods typically rely on real-time GNSS and Inertial Measurement Unit (IMU) data, which require pre-calibration and assume stable signals. However, this assumption often fails in dense cities, resulting in localization errors. To address this, we propose a structured geo-registration method that accurately aligns LiDAR point clouds with satellite images, enabling frame-wise geo-registration and city-scale 3D reconstruction without prior localization. Our method uses a pre-trained Point Transformer to segment road points, then extracts road skeletons and intersections from the point cloud and the satellite image. Global alignment is achieved through rigid transformation using corresponding intersection points, followed by local non-rigid refinement with radial basis function (RBF) interpolation. Elevation discrepancies are corrected using terrain data from the Shuttle Radar Topography Mission (SRTM). To evaluate geo-registration accuracy, we measure the absolute distances between the roads extracted from the two modalities. Our method is validated on the KITTI benchmark and a newly collected dataset of Perth, Western Australia. On KITTI, our method achieves a mean planimetric alignment error of 0.69m, representing 50% improvement over the raw KITTI data. On Perth dataset, it achieves a mean planimetric error of 2.17m from GNSS values extracted from Google Maps, corresponding to 57.4% improvement over rigid alignment. Elevation correlation improved by 30.5% (KITTI) and 55.8% (Perth). A demonstration video is available at: https://youtu.be/0wkACAB-O6E.
CVAug 11, 2025
Hyperspectral ImagingDanfeng Hong, Chenyu Li, Naoto Yokoya et al.
Hyperspectral imaging (HSI) is an advanced sensing modality that simultaneously captures spatial and spectral information, enabling non-invasive, label-free analysis of material, chemical, and biological properties. This Primer presents a comprehensive overview of HSI, from the underlying physical principles and sensor architectures to key steps in data acquisition, calibration, and correction. We summarize common data structures and highlight classical and modern analysis methods, including dimensionality reduction, classification, spectral unmixing, and AI-driven techniques such as deep learning. Representative applications across Earth observation, precision agriculture, biomedicine, industrial inspection, cultural heritage, and security are also discussed, emphasizing HSI's ability to uncover sub-visual features for advanced monitoring, diagnostics, and decision-making. Persistent challenges, such as hardware trade-offs, acquisition variability, and the complexity of high-dimensional data, are examined alongside emerging solutions, including computational imaging, physics-informed modeling, cross-modal fusion, and self-supervised learning. Best practices for dataset sharing, reproducibility, and metadata documentation are further highlighted to support transparency and reuse. Looking ahead, we explore future directions toward scalable, real-time, and embedded HSI systems, driven by sensor miniaturization, self-supervised learning, and foundation models. As HSI evolves into a general-purpose, cross-disciplinary platform, it holds promise for transformative applications in science, technology, and society.
IVJul 14, 2021
Multi-Attention Generative Adversarial Network for Remote Sensing Image Super-ResolutionMeng Xu, Zhihao Wang, Jiasong Zhu et al.
Image super-resolution (SR) methods can generate remote sensing images with high spatial resolution without increasing the cost, thereby providing a feasible way to acquire high-resolution remote sensing images, which are difficult to obtain due to the high cost of acquisition equipment and complex weather. Clearly, image super-resolution is a severe ill-posed problem. Fortunately, with the development of deep learning, the powerful fitting ability of deep neural networks has solved this problem to some extent. In this paper, we propose a network based on the generative adversarial network (GAN) to generate high resolution remote sensing images, named the multi-attention generative adversarial network (MA-GAN). We first designed a GAN-based framework for the image SR task. The core to accomplishing the SR task is the image generator with post-upsampling that we designed. The main body of the generator contains two blocks; one is the pyramidal convolution in the residual-dense block (PCRDB), and the other is the attention-based upsample (AUP) block. The attentioned pyramidal convolution (AttPConv) in the PCRDB block is a module that combines multi-scale convolution and channel attention to automatically learn and adjust the scaling of the residuals for better results. The AUP block is a module that combines pixel attention (PA) to perform arbitrary multiples of upsampling. These two blocks work together to help generate better quality images. For the loss function, we design a loss function based on pixel loss and introduce both adversarial loss and feature loss to guide the generator learning. We have compared our method with several state-of-the-art methods on a remote sensing scene image dataset, and the experimental results consistently demonstrate the effectiveness of the proposed MA-GAN.
CVMay 10, 2021
An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing ImageryXuan Yang, Shanshan Li, Zhengchao Chen et al.
Semantic segmentation is an essential part of deep learning. In recent years, with the development of remote sensing big data, semantic segmentation has been increasingly used in remote sensing. Deep convolutional neural networks (DCNNs) face the challenge of feature fusion: very-high-resolution remote sensing image multisource data fusion can increase the network's learnable information, which is conducive to correctly classifying target objects by DCNNs; simultaneously, the fusion of high-level abstract features and low-level spatial features can improve the classification accuracy at the border between target objects. In this paper, we propose a multipath encoder structure to extract features of multipath inputs, a multipath attention-fused block module to fuse multipath features, and a refinement attention-fused block module to fuse high-level abstract features and low-level spatial features. Furthermore, we propose a novel convolutional neural network architecture, named attention-fused network (AFNet). Based on our AFNet, we achieve state-of-the-art performance with an overall accuracy of 91.7% and a mean F1 score of 90.96% on the ISPRS Vaihingen 2D dataset and an overall accuracy of 92.1% and a mean F1 score of 93.44% on the ISPRS Potsdam 2D dataset.
CVNov 29, 2019
Online Structured Sparsity-based Moving Object Detection from Satellite VideosJunpeng Zhang, Xiuping Jia, Jiankun Hu et al.
Inspired by the recent developments in computer vision, low-rank and structured sparse matrix decomposition can be potentially be used for extract moving objects in satellite videos. This set of approaches seeks for rank minimization on the background that typically requires batch-based optimization over a sequence of frames, which causes delays in processing and limits their applications. To remedy this delay, we propose an Online Low-rank and Structured Sparse Decomposition (O-LSD). O-LSD reformulates the batch-based low-rank matrix decomposition with the structured sparse penalty to its equivalent frame-wise separable counterpart, which then defines a stochastic optimization problem for online subspace basis estimation. In order to promote online processing, O-LSD conducts the foreground and background separation and the subspace basis update alternatingly for every frame in a video. We also show the convergence of O-LSD theoretically. Experimental results on two satellite videos demonstrate the performance of O-LSD in term of accuracy and time consumption is comparable with the batch-based approaches with significantly reduced delay in processing.
CVAug 26, 2019
Error Bounded Foreground and Background Modeling for Moving Object Detection in Satellite VideosJunpeng Zhang, Xiuping Jia, Jiankun Hu
Detecting moving objects from ground-based videos is commonly achieved by using background subtraction techniques. Low-rank matrix decomposition inspires a set of state-of-the-art approaches for this task. It is integrated with structured sparsity regularization to achieve background subtraction in the developed method of Low-rank and Structured Sparse Decomposition (LSD). However, when this method is applied to satellite videos where spatial resolution is poor and targets' contrast to the background is low, its performance is limited as the data no longer fits adequately either the foreground structure or the background model. In this paper, we handle these unexplained data explicitly and address the moving target detection from space as one of the pioneer studies. We propose a technique by extending the decomposition formulation with bounded errors, named Extended Low-rank and Structured Sparse Decomposition (E-LSD). This formulation integrates low-rank background, structured sparse foreground and their residuals in a matrix decomposition problem. We provide an effective solution by introducing an alternative treatment and adopting the direct extension of Alternating Direction Method of Multipliers (ADMM). The proposed E-LSD was validated on two satellite videos, and experimental results demonstrate the improvement in background modeling with boosted moving object detection precision over state-of-the-art methods.
CVNov 13, 2017
Conditional Random Field and Deep Feature Learning for Hyperspectral Image SegmentationFahim Irfan Alam, Jun Zhou, Alan Wee-Chung Liew et al.
Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.
CVJun 15, 2017
Effective Sequential Classifier Training for SVM-based Multitemporal Remote Sensing Image ClassificationYiqing Guo, Xiuping Jia, David Paull
The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.
CVJul 1, 2014
A New Path to Construct Parametric Orientation Field: Sparse FOMFE Model and Compressed Sparse FOMFE ModelJinwei Xu, Jiankun Hu, Xiuping Jia
Orientation field, representing the fingerprint ridge structure direction, plays a crucial role in fingerprint-related image processing tasks. Orientation field is able to be constructed by either non-parametric or parametric methods. In this paper, the advantages and disadvantages regarding to the existing non-parametric and parametric approaches are briefly summarized. With the further investigation for constructing the orientation field by parametric technique, two new models - sparse FOMFE model and compressed sparse FOMFE model are introduced, based on the rapidly developing signal sparse representation and compressed sensing theories. The experiments on high-quality fingerprint image dataset (plain and rolled print) and poor-quality fingerprint image dataset (latent print) demonstrate their feasibilities to construct the orientation field in a sparse or even compressed sparse mode. The comparisons among the state-of-art orientation field modeling approaches show that the proposed two models have the potential availability in big data-oriented fingerprint indexing tasks.
CRJun 26, 2014
A Fully Automated Latent Fingerprint Matcher with Embedded Self-learning Segmentation ModuleJinwei Xu, Jiankun Hu, Xiuping Jia
Latent fingerprint has the practical value to identify the suspects who have unintentionally left a trace of fingerprint in the crime scenes. However, designing a fully automated latent fingerprint matcher is a very challenging task as it needs to address many challenging issues including the separation of overlapping structured patterns over the partial and poor quality latent fingerprint image, and finding a match against a large background database that would have different resolutions. Currently there is no fully automated latent fingerprint matcher available to the public and most literature reports have utilized a specialized latent fingerprint matcher COTS3 which is not accessible to the public. This will make it infeasible to assess and compare the relevant research work which is vital for this research community. In this study, we target to develop a fully automated latent matcher for adaptive detection of the region of interest and robust matching of latent prints. Unlike the manually conducted matching procedure, the proposed latent matcher can run like a sealed black box without any manual intervention. This matcher consists of the following two modules: (i) the dictionary learning-based region of interest (ROI) segmentation scheme; and (ii) the genetic algorithm-based minutiae set matching unit. Experimental results on NIST SD27 latent fingerprint database demonstrates that the proposed matcher outperforms the currently public state-of-art latent fingerprint matcher.