Xiangzhi Bai

CV
h-index8
7papers
91citations
Novelty52%
AI Score49

7 Papers

CVApr 17, 2023Code
Learning to "Segment Anything" in Thermal Infrared Images through Knowledge Distillation with a Large Scale Dataset SATIR

Junzhang Chen, Xiangzhi Bai

The Segment Anything Model (SAM) is a promptable segmentation model recently introduced by Meta AI that has demonstrated its prowess across various fields beyond just image segmentation. SAM can accurately segment images across diverse fields, and generating various masks. We discovered that this ability of SAM can be leveraged to pretrain models for specific fields. Accordingly, we have proposed a framework that utilizes SAM to generate pseudo labels for pretraining thermal infrared image segmentation tasks. Our proposed framework can effectively improve the accuracy of segmentation results of specific categories beyond the SOTA ImageNet pretrained model. Our framework presents a novel approach to collaborate with models trained with large data like SAM to address problems in special fields. Also, we generated a large scale thermal infrared segmentation dataset used for pretaining, which contains over 100,000 images with pixel-annotation labels. This approach offers an effective solution for working with large models in special fields where label annotation is challenging. Our code is available at https://github.com/chenjzBUAA/SATIR

IVNov 14, 2023Code
SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation

Yinuo Wang, Kai Chen, Weimin Yuan et al.

Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In this paper, a SAM-based parameter-efficient fine-tuning method, called SAMIHS, is proposed for intracranial hemorrhage segmentation, which is a crucial and challenging step in stroke diagnosis and surgical planning. Distinguished from previous SAM and SAM-based methods, SAMIHS incorporates parameter-refactoring adapters into SAM's image encoder and considers the efficient and flexible utilization of adapters' parameters. Additionally, we employ a combo loss that combines binary cross-entropy loss and boundary-sensitive loss to enhance SAMIHS's ability to recognize the boundary regions. Our experimental results on two public datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/mileswyn/SAMIHS .

CVSep 12, 2024Code
Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis

Qian Chen, Shihao Shu, Xiangzhi Bai

Novel-view synthesis based on visible light has been extensively studied. In comparison to visible light imaging, thermal infrared imaging offers the advantage of all-weather imaging and strong penetration, providing increased possibilities for reconstruction in nighttime and adverse weather scenarios. However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images. To address these limitations, this paper introduces a physics-induced 3D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins by modeling atmospheric transmission effects and thermal conduction in three-dimensional media using neural networks. Additionally, a temperature consistency constraint is incorporated into the optimization objective to enhance the reconstruction accuracy of thermal infrared images. Furthermore, to validate the effectiveness of our method, the first large-scale benchmark dataset for this field named Thermal Infrared Novel-view Synthesis Dataset (TI-NSD) is created. This dataset comprises 20 authentic thermal infrared video scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios, totaling 6,664 frames of thermal infrared image data. Based on this dataset, this paper experimentally verifies the effectiveness of Thermal3D-GS. The results indicate that our method outperforms the baseline method with a 3.03 dB improvement in PSNR and significantly addresses the issues of floaters and indistinct edge features present in the baseline method. Our dataset and codebase will be released in \href{https://github.com/mzzcdf/Thermal3DGS}{\textcolor{red}{Thermal3DGS}}.

CVJan 5Code
Prior-Guided DETR for Ultrasound Nodule Detection

Jingjing Wang, Zhuo Xiao, Xinning Yao et al.

Accurate detection of ultrasound nodules is essential for the early diagnosis and treatment of thyroid and breast cancers. However, this task remains challenging due to irregular nodule shapes, indistinct boundaries, substantial scale variations, and the presence of speckle noise that degrades structural visibility. To address these challenges, we propose a prior-guided DETR framework specifically designed for ultrasound nodule detection. Instead of relying on purely data-driven feature learning, the proposed framework progressively incorporates different prior knowledge at multiple stages of the network. First, a Spatially-adaptive Deformable FFN with Prior Regularization (SDFPR) is embedded into the CNN backbone to inject geometric priors into deformable sampling, stabilizing feature extraction for irregular and blurred nodules. Second, a Multi-scale Spatial-Frequency Feature Mixer (MSFFM) is designed to extract multi-scale structural priors, where spatial-domain processing emphasizes contour continuity and boundary cues, while frequency-domain modeling captures global morphology and suppresses speckle noise. Furthermore, a Dense Feature Interaction (DFI) mechanism propagates and exploits these prior-modulated features across all encoder layers, enabling the decoder to enhance query refinement under consistent geometric and structural guidance. Experiments conducted on two clinically collected thyroid ultrasound datasets (Thyroid I and Thyroid II) and two public benchmarks (TN3K and BUSI) for thyroid and breast nodules demonstrate that the proposed method achieves superior accuracy compared with 18 detection methods, particularly in detecting morphologically complex nodules.The source code is publicly available at https://github.com/wjj1wjj/Ultrasound-DETR.

CVJan 25, 2024Code
MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable Registration

Tao Guo, Yinuo Wang, Shihao Shu et al.

Capturing voxel-wise spatial correspondence across distinct modalities is crucial for medical image analysis. However, current registration approaches are not practical enough in terms of registration accuracy and clinical applicability. In this paper, we introduce MambaMorph, a novel multi-modality deformable registration framework. Specifically, MambaMorph utilizes a Mamba-based registration module and a fine-grained, yet simple, feature extractor for efficient long-range correspondence modeling and high-dimensional feature learning, respectively. Additionally, we develop a well-annotated brain MR-CT registration dataset, SR-Reg, to address the scarcity of data in multi-modality registration. To validate MambaMorph's multi-modality registration capabilities, we conduct quantitative experiments on both our SR-Reg dataset and a public T1-T2 dataset. The experimental results on both datasets demonstrate that MambaMorph significantly outperforms the current state-of-the-art learning-based registration methods in terms of registration accuracy. Further study underscores the efficiency of the Mamba-based registration module and the lightweight feature extractor, which achieve notable registration quality while maintaining reasonable computational costs and speeds. We believe that MambaMorph holds significant potential for practical applications in medical image registration. The code for MambaMorph is available at: https://github.com/Guo-Stone/MambaMorph.

CVOct 19, 2025
GS2POSE: Marry Gaussian Splatting to 6D Object Pose Estimation

Junbo Li, Weimin Yuan, Yinuo Wang et al.

Accurate 6D pose estimation of 3D objects is a fundamental task in computer vision, and current research typically predicts the 6D pose by establishing correspondences between 2D image features and 3D model features. However, these methods often face difficulties with textureless objects and varying illumination conditions. To overcome these limitations, we propose GS2POSE, a novel approach for 6D object pose estimation. GS2POSE formulates a pose regression algorithm inspired by the principles of Bundle Adjustment (BA). By leveraging Lie algebra, we extend the capabilities of 3DGS to develop a pose-differentiable rendering pipeline, which iteratively optimizes the pose by comparing the input image to the rendered image. Additionally, GS2POSE updates color parameters within the 3DGS model, enhancing its adaptability to changes in illumination. Compared to previous models, GS2POSE demonstrates accuracy improvements of 1.4\%, 2.8\% and 2.5\% on the T-LESS, LineMod-Occlusion and LineMod datasets, respectively.

CVMay 30, 2023
Infrared Image Deturbulence Restoration Using Degradation Parameter-Assisted Wide & Deep Learning

Yi Lu, Yadong Wang, Xingbo Jiang et al.

Infrared images captured under turbulent conditions are degraded by complex geometric distortions and blur. We address infrared deturbulence as an image restoration task, proposing DparNet, a parameter-assisted multi-frame network with a wide & deep architecture. DparNet learns a degradation prior (key parameter matrix) directly from degraded images without external knowledge. Its wide & deep architecture uses these learned parameters to directly modulate restoration, achieving spatially and intensity adaptive results. Evaluated on dedicated infrared deturbulence (49,744 images) and visible image denoising (109,536 images) datasets, DparNet significantly outperforms State-of-the-Art (SOTA) methods in restoration performance and efficiency. Notably, leveraging these parameters improves PSNR by 0.6-1.1 dB with less than 2% increase in model parameters and computational complexity. Our work demonstrates that degraded images hide key degradation information that can be learned and utilized to boost adaptive image restoration.