Shaobo Xia

CV
h-index74
12papers
446citations
Novelty45%
AI Score49

12 Papers

CVApr 12, 2023Code
SpectralDiff: A Generative Framework for Hyperspectral Image Classification with Diffusion Models

Ning Chen, Jun Yue, Leyuan Fang et al.

Hyperspectral Image (HSI) classification is an important issue in remote sensing field with extensive applications in earth science. In recent years, a large number of deep learning-based HSI classification methods have been proposed. However, existing methods have limited ability to handle high-dimensional, highly redundant, and complex data, making it challenging to capture the spectral-spatial distributions of data and relationships between samples. To address this issue, we propose a generative framework for HSI classification with diffusion models (SpectralDiff) that effectively mines the distribution information of high-dimensional and highly redundant data by iteratively denoising and explicitly constructing the data generation process, thus better reflecting the relationships between samples. The framework consists of a spectral-spatial diffusion module, and an attention-based classification module. The spectral-spatial diffusion module adopts forward and reverse spectral-spatial diffusion processes to achieve adaptive construction of sample relationships without requiring prior knowledge of graphical structure or neighborhood information. It captures spectral-spatial distribution and contextual information of objects in HSI and mines unsupervised spectral-spatial diffusion features within the reverse diffusion process. Finally, these features are fed into the attention-based classification module for per-pixel classification. The diffusion features can facilitate cross-sample perception via reconstruction distribution, leading to improved classification performance. Experiments on three public HSI datasets demonstrate that the proposed method can achieve better performance than state-of-the-art methods. For the sake of reproducibility, the source code of SpectralDiff will be publicly available at https://github.com/chenning0115/SpectralDiff.

CVJan 19, 2023
Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models

Jun Yue, Leyuan Fang, Shaobo Xia et al.

Color plays an important role in human visual perception, reflecting the spectrum of objects. However, the existing infrared and visible image fusion methods rarely explore how to handle multi-spectral/channel data directly and achieve high color fidelity. This paper addresses the above issue by proposing a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors. In specific, instead of converting multi-channel images into single-channel data in existing fusion methods, we create the multi-channel data distribution with a denoising network in a latent space with forward and reverse diffusion process. Then, we use the the denoising network to extract the multi-channel diffusion features with both visible and infrared information. Finally, we feed the multi-channel diffusion features to the multi-channel fusion module to directly generate the three-channel fused image. To retain the texture and intensity information, we propose multi-channel gradient loss and intensity loss. Along with the current evaluation metrics for measuring texture and intensity fidelity, we introduce a new evaluation metric to quantify color fidelity. Extensive experiments indicate that our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.

CVJan 13
Source-Free Domain Adaptation for Geospatial Point Cloud Semantic Segmentation

Yuan Gao, Di Cao, Xiaohuan Xi et al.

Semantic segmentation of 3D geospatial point clouds is pivotal for remote sensing applications. However, variations in geographic patterns across regions and data acquisition strategies induce significant domain shifts, severely degrading the performance of deployed models. Existing domain adaptation methods typically rely on access to source-domain data. However, this requirement is rarely met due to data privacy concerns, regulatory policies, and data transmission limitations. This motivates the largely underexplored setting of source-free unsupervised domain adaptation (SFUDA), where only a pretrained model and unlabeled target-domain data are available. In this paper, we propose LoGo (Local-Global Dual-Consensus), a novel SFUDA framework specifically designed for geospatial point clouds. At the local level, we introduce a class-balanced prototype estimation module that abandons conventional global threshold filtering in favor of an intra-class independent anchor mining strategy. This ensures that robust feature prototypes can be generated even for sample-scarce tail classes, effectively mitigating the feature collapse caused by long-tailed distributions. At the global level, we introduce an optimal transport-based global distribution alignment module that formulates pseudo-label assignment as a global optimization problem. By enforcing global distribution constraints, this module effectively corrects the over-dominance of head classes inherent in local greedy assignments, preventing model predictions from being severely biased towards majority classes. Finally, we propose a dual-consistency pseudo-label filtering mechanism. This strategy retains only high-confidence pseudo-labels where local multi-augmented ensemble predictions align with global optimal transport assignments for self-training.

CVFeb 16Code
Cross-view Domain Generalization via Geometric Consistency for LiDAR Semantic Segmentation

Jindong Zhao, Yuan Gao, Yang Xia et al.

Domain-generalized LiDAR semantic segmentation (LSS) seeks to train models on source-domain point clouds that generalize reliably to multiple unseen target domains, which is essential for real-world LiDAR applications. However, existing approaches assume similar acquisition views (e.g., vehicle-mounted) and struggle in cross-view scenarios, where observations differ substantially due to viewpoint-dependent structural incompleteness and non-uniform point density. Accordingly, we formulate cross-view domain generalization for LiDAR semantic segmentation and propose a novel framework, termed CVGC (Cross-View Geometric Consistency). Specifically, we introduce a cross-view geometric augmentation module that models viewpoint-induced variations in visibility and sampling density, generating multiple cross-view observations of the same scene. Subsequently, a geometric consistency module enforces consistent semantic and occupancy predictions across geometrically augmented point clouds of the same scene. Extensive experiments on six public LiDAR datasets establish the first systematic evaluation of cross-view domain generalization for LiDAR semantic segmentation, demonstrating that CVGC consistently outperforms state-of-the-art methods when generalizing from a single source domain to multiple target domains with heterogeneous acquisition viewpoints. The source code will be publicly available at https://github.com/KintomZi/CVGC-DG

CVMay 15, 2025Code
APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

Yuan Gao, Shaobo Xia, Sheng Nie et al.

Airborne laser scanning (ALS) point cloud segmentation is a fundamental task for large-scale 3D scene understanding. In real-world applications, models are typically fixed after training. However, domain shifts caused by changes in the environment, sensor types, or sensor degradation often lead to a decline in model performance. Continuous Test-Time Adaptation (CTTA) offers a solution by adapting a source-pretrained model to evolving, unlabeled target domains. Despite its potential, research on ALS point clouds remains limited, facing challenges such as the absence of standardized datasets and the risk of catastrophic forgetting and error accumulation during prolonged adaptation. To tackle these challenges, we propose APCoTTA, the first CTTA method tailored for ALS point cloud semantic segmentation. We propose a dynamic trainable layer selection module. This module utilizes gradient information to select low-confidence layers for training, and the remaining layers are kept frozen, mitigating catastrophic forgetting. To further reduce error accumulation, we propose an entropy-based consistency loss. By losing such samples based on entropy, we apply consistency loss only to the reliable samples, enhancing model stability. In addition, we propose a random parameter interpolation mechanism, which randomly blends parameters from the selected trainable layers with those of the source model. This approach helps balance target adaptation and source knowledge retention, further alleviating forgetting. Finally, we construct two benchmarks, ISPRSC and H3DC, to address the lack of CTTA benchmarks for ALS point cloud segmentation. Experimental results demonstrate that APCoTTA achieves the best performance on two benchmarks, with mIoU improvements of approximately 9% and 14% over direct inference. The new benchmarks and code are available at https://github.com/Gaoyuan2/APCoTTA.

CVOct 15, 2025Code
UniVector: Unified Vector Extraction via Instance-Geometry Interaction

Yinglong Yan, Jun Yue, Shaobo Xia et al.

Vector extraction retrieves structured vector geometry from raster images, offering high-fidelity representation and broad applicability. Existing methods, however, are usually tailored to a single vector type (e.g., polygons, polylines, line segments), requiring separate models for different structures. This stems from treating instance attributes (category, structure) and geometric attributes (point coordinates, connections) independently, limiting the ability to capture complex structures. Inspired by the human brain's simultaneous use of semantic and spatial interactions in visual perception, we propose UniVector, a unified VE framework that leverages instance-geometry interaction to extract multiple vector types within a single model. UniVector encodes vectors as structured queries containing both instance- and geometry-level information, and iteratively updates them through an interaction module for cross-level context exchange. A dynamic shape constraint further refines global structures and key points. To benchmark multi-structure scenarios, we introduce the Multi-Vector dataset with diverse polygons, polylines, and line segments. Experiments show UniVector sets a new state of the art on both single- and multi-structure VE tasks. Code and dataset will be released at https://github.com/yyyyll0ss/UniVector.

IVApr 23, 2020Code
Uncertainty Quantification for Hyperspectral Image Denoising Frameworks based on Low-rank Matrix Approximation

Jingwei Song, Shaobo Xia, Jun Wang et al.

Sliding-window based low-rank matrix approximation (LRMA) is a technique widely used in hyperspectral images (HSIs) denoising or completion. However, the uncertainty quantification of the restored HSI has not been addressed to date. Accurate uncertainty quantification of the denoised HSI facilitates to applications such as multi-source or multi-scale data fusion, data assimilation, and product uncertainty quantification, since these applications require an accurate approach to describe the statistical distributions of the input data. Therefore, we propose a prior-free closed-form element-wise uncertainty quantification method for LRMA-based HSI restoration. Our closed-form algorithm overcomes the difficulty of the HSI patch mixing problem caused by the sliding-window strategy used in the conventional LRMA process. The proposed approach only requires the uncertainty of the observed HSI and provides the uncertainty result relatively rapidly and with similar computational complexity as the LRMA technique. We conduct extensive experiments to validate the estimation accuracy of the proposed closed-form uncertainty approach. The method is robust to at least 10% random impulse noise at the cost of 10-20% of additional processing time compared to the LRMA. The experiments indicate that the proposed closed-form uncertainty quantification method is more applicable to real-world applications than the baseline Monte Carlo test, which is computationally expensive. The code is available in the attachment and will be released after the acceptance of this paper.

CVApr 13, 2024
Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives

Yidan Liu, Jun Yue, Shaobo Xia et al.

As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing (RS) community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in research on diffusion models in the field of RS, it is necessary to conduct a comprehensive review of existing diffusion model-based RS papers, to help researchers recognize the potential of diffusion models and provide some directions for further exploration. Specifically, this article first introduces the theoretical background of diffusion models, and then systematically reviews the applications of diffusion models in RS, including image generation, enhancement, and interpretation. Finally, the limitations of existing RS diffusion models and worthy research directions for further exploration are discussed and summarized.

CVDec 11, 2023
Weakly Supervised Point Cloud Segmentation via Conservative Propagation of Scene-level Labels

Shaobo Xia, Jun Yue, Kacper Kania et al.

We propose a weakly supervised semantic segmentation method for point clouds that predicts "per-point" labels from just "whole-scene" annotations. The key challenge here is the discrepancy between the target of dense per-point semantic prediction and training losses derived from only scene-level labels. To address this, in addition to the typical weakly-supervised setup that supervises all points with the scene label, we propose to conservatively propagate the scene-level labels to points selectively. Specifically, we over-segment point cloud features via unsupervised clustering in the entire dataset and form primitives. We then associate scene-level labels with primitives through bipartite matching. Then, we allow labels to pass through this primitive-label relationship, while further encouraging features to form narrow clusters around the primitives. Importantly, through bipartite matching, this additional pathway through which labels flow, only propagates scene labels to the most relevant points, reducing the potential negative impact caused by the global approach that existing methods take. We evaluate our method on ScanNet and S3DIS datasets, outperforming the state of the art by a large margin.

CVMar 24, 2025
LiDAR Remote Sensing Meets Weak Supervision: Concepts, Methods, and Perspectives

Yuan Gao, Shaobo Xia, Pu Wang et al.

Light detection and ranging (LiDAR) remote sensing encompasses two major directions: data interpretation and parameter inversion. However, both directions rely heavily on costly and labor-intensive labeled data and field measurements, which constrains their scalability and spatiotemporal adaptability. Weakly Supervised Learning (WSL) provides a unified framework to address these limitations. This paper departs from the traditional view that treats interpretation and inversion as separate tasks and offers a systematic review of recent advances in LiDAR remote sensing from a unified WSL perspective. We cover typical WSL settings including incomplete supervision(e.g., sparse point labels), inexact supervision (e.g., scene-level tags), inaccurate supervision (e.g., noisy labels), and cross-domain supervision (e.g., domain adaptation/generalization) and corresponding techniques such as pseudo-labeling, consistency regularization, self-training, and label refinement, which collectively enable robust learning from limited and weak annotations.We further analyze LiDAR-specific challenges (e.g., irregular geometry, data sparsity, domain heterogeneity) that require tailored weak supervision, and examine how sparse LiDAR observations can guide joint learning with other remote-sensing data for continuous surface-parameter retrieval. Finally, we highlight future directions where WSL acts as a bridge between LiDAR and foundation models to leverage large-scale multimodal datasets and reduce labeling costs, while also enabling broader WSL-driven advances in generalization, open-world adaptation, and scalable LiDAR remote sensing.

CVFeb 18, 2025
NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization

Zhen Li, Weiwei Sun, Shrisudhan Govindarajan et al.

We present a novel approach to large-scale point cloud surface reconstruction by developing an efficient framework that converts an irregular point cloud into a signed distance field (SDF). Our backbone builds upon recent transformer-based architectures (i.e., PointTransformerV3), that serializes the point cloud into a locality-preserving sequence of tokens. We efficiently predict the SDF value at a point by aggregating nearby tokens, where fast approximate neighbors can be retrieved thanks to the serialization. We serialize the point cloud at different levels/scales, and non-linearly aggregate a feature to predict the SDF value. We show that aggregating across multiple scales is critical to overcome the approximations introduced by the serialization (i.e. false negatives in the neighborhood). Our frameworks sets the new state-of-the-art in terms of accuracy and efficiency (better or similar performance with half the latency of the best prior method, coupled with a simpler implementation), particularly on outdoor datasets where sparse-grid methods have shown limited performance.

CVMar 22, 2020
Curved Buildings Reconstruction from Airborne LiDAR Data by Matching and Deforming Geometric Primitives

Jingwei Song, Shaobo Xia, Jun Wang et al.

Airborne LiDAR (Light Detection and Ranging) data is widely applied in building reconstruction, with studies reporting success in typical buildings. However, the reconstruction of curved buildings remains an open research problem. To this end, we propose a new framework for curved building reconstruction via assembling and deforming geometric primitives. The input LiDAR point cloud are first converted into contours where individual buildings are identified. After recognizing geometric units (primitives) from building contours, we get initial models by matching basic geometric primitives to these primitives. To polish assembly models, we employ a warping field for model refinements. Specifically, an embedded deformation (ED) graph is constructed via downsampling the initial model. Then, the point-to-model displacements are minimized by adjusting node parameters in the ED graph based on our objective function. The presented framework is validated on several highly curved buildings collected by various LiDAR in different cities. The experimental results, as well as accuracy comparison, demonstrate the advantage and effectiveness of our method. {The new insight attributes to an efficient reconstruction manner.} Moreover, we prove that the primitive-based framework significantly reduces the data storage to 10-20 percent of classical mesh models.