Xiaoxiang Han

IV
h-index11
7papers
68citations
Novelty50%
AI Score31

7 Papers

CVDec 16, 2022Code
LOANet: A Lightweight Network Using Object Attention for Extracting Buildings and Roads from UAV Aerial Remote Sensing Images

Xiaoxiang Han, Yiman Liu, Gang Liu et al.

Semantic segmentation for extracting buildings and roads from uncrewed aerial vehicle (UAV) remote sensing images by deep learning becomes a more efficient and convenient method than traditional manual segmentation in surveying and mapping fields. In order to make the model lightweight and improve the model accuracy, a Lightweight Network Using Object Attention (LOANet) for Buildings and Roads from UAV Aerial Remote Sensing Images is proposed. The proposed network adopts an encoder-decoder architecture in which a Lightweight Densely Connected Network (LDCNet) is developed as the encoder. In the decoder part, the dual multi-scale context modules which consist of the Atrous Spatial Pyramid Pooling module (ASPP) and the Object Attention Module (OAM) are designed to capture more context information from feature maps of UAV remote sensing images. Between ASPP and OAM, a Feature Pyramid Network (FPN) module is used to fuse multi-scale features extracted from ASPP. A private dataset of remote sensing images taken by UAV which contains 2431 training sets, 945 validation sets, and 475 test sets is constructed. The proposed basic model performs well on this dataset, with only 1.4M parameters and 5.48G floating point operations (FLOPs), achieving excellent mean Intersection-over-Union (mIoU). Further experiments on the publicly available LoveDA and CITY-OSM datasets have been conducted to further validate the effectiveness of the proposed basic and large model, and outstanding mIoU results have been achieved. All codes are available on https://github.com/GtLinyer/LOANet.

IVFeb 27, 2023
EDMAE: An Efficient Decoupled Masked Autoencoder for Standard View Identification in Pediatric Echocardiography

Yiman Liu, Xiaoxiang Han, Tongtong Liang et al.

This paper introduces the Efficient Decoupled Masked Autoencoder (EDMAE), a novel self-supervised method for recognizing standard views in pediatric echocardiography. EDMAE introduces a new proxy task based on the encoder-decoder structure. The EDMAE encoder is composed of a teacher and a student encoder. The teacher encoder extracts the potential representation of the masked image blocks, while the student encoder extracts the potential representation of the visible image blocks. The loss is calculated between the feature maps output by the two encoders to ensure consistency in the latent representations they extract. EDMAE uses pure convolution operations instead of the ViT structure in the MAE encoder. This improves training efficiency and convergence speed. EDMAE is pre-trained on a large-scale private dataset of pediatric echocardiography using self-supervised learning, and then fine-tuned for standard view recognition. The proposed method achieves high classification accuracy in 27 standard views of pediatric echocardiography. To further verify the effectiveness of the proposed method, the authors perform another downstream task of cardiac ultrasound segmentation on the public dataset CAMUS. The experimental results demonstrate that the proposed method outperforms some popular supervised and recent self-supervised methods, and is more competitive on different downstream tasks.

IVJun 6, 2023
Atrial Septal Defect Detection in Children Based on Ultrasound Video Using Multiple Instances Learning

Yiman Liu, Qiming Huang, Xiaoxiang Han et al.

Purpose: Congenital heart defect (CHD) is the most common birth defect. Thoracic echocardiography (TTE) can provide sufficient cardiac structure information, evaluate hemodynamics and cardiac function, and is an effective method for atrial septal defect (ASD) examination. This paper aims to study a deep learning method based on cardiac ultrasound video to assist in ASD diagnosis. Materials and methods: We select two standard views of the atrial septum (subAS) and low parasternal four-compartment view (LPS4C) as the two views to identify ASD. We enlist data from 300 children patients as part of a double-blind experiment for five-fold cross-validation to verify the performance of our model. In addition, data from 30 children patients (15 positives and 15 negatives) are collected for clinician testing and compared to our model test results (these 30 samples do not participate in model training). We propose an echocardiography video-based atrial septal defect diagnosis system. In our model, we present a block random selection, maximal agreement decision and frame sampling strategy for training and testing respectively, resNet18 and r3D networks are used to extract the frame features and aggregate them to build a rich video-level representation. Results: We validate our model using our private dataset by five-cross validation. For ASD detection, we achieve 89.33 AUC, 84.95 accuracy, 85.70 sensitivity, 81.51 specificity and 81.99 F1 score. Conclusion: The proposed model is multiple instances learning-based deep learning model for video atrial septal defect detection which effectively improves ASD detection accuracy when compared to the performances of previous networks and clinical doctors.

IVSep 28, 2024Code
MambaEviScrib: Mamba and Evidence-Guided Consistency Enhance CNN Robustness for Scribble-Based Weakly Supervised Ultrasound Image Segmentation

Xiaoxiang Han, Xinyu Li, Jiang Shang et al.

Segmenting anatomical structures and lesions from ultrasound images contributes to disease assessment. Weakly supervised learning (WSL) based on sparse annotation has achieved encouraging performance and demonstrated the potential to reduce annotation costs. This study attempts to introduce scribble-based WSL into ultrasound image segmentation tasks. However, ultrasound images often suffer from poor contrast and unclear edges, coupled with insufficient supervison signals for edges, posing challenges to edge prediction. Uncertainty modeling has been proven to facilitate models in dealing with these issues. Nevertheless, existing uncertainty estimation paradigms are not robust enough and often filter out predictions near decision boundaries, resulting in unstable edge predictions. Therefore, we propose leveraging predictions near decision boundaries effectively. Specifically, we introduce Dempster-Shafer Theory (DST) of evidence to design an Evidence-Guided Consistency strategy. This strategy utilizes high-evidence predictions, which are more likely to occur near high-density regions, to guide the optimization of low-evidence predictions that may appear near decision boundaries. Furthermore, the diverse sizes and locations of lesions in ultrasound images pose a challenge for CNNs with local receptive fields, as they struggle to model global information. Therefore, we introduce Visual Mamba based on structured state space sequence models, which achieves long-range dependency with linear computational complexity, and we construct a novel hybrid CNN-Mamba framework. During training, the collaboration between the CNN branch and the Mamba branch in the proposed framework draws inspiration from each other based on the EGC strategy. Experiments demonstrate the competitiveness of the proposed method. Dataset and code will be available on https://github.com/GtLinyer/MambaEviScrib.

IVMar 9, 2023
Reconstruction of Cardiac Cine MRI Using Motion-Guided Deformable Alignment and Multi-Resolution Fusion

Xiaoxiang Han, Yang Chen, Qiaohong Liu et al.

Cardiac cine magnetic resonance imaging (MRI) is one of the important means to assess cardiac functions and vascular abnormalities. Mitigating artifacts arising during image reconstruction and accelerating cardiac cine MRI acquisition to obtain high-quality images is important. A novel end-to-end deep learning network is developed to improve cardiac cine MRI reconstruction. First, a U-Net is adopted to obtain the initial reconstructed images in k-space. Further to remove the motion artifacts, the motion-guided deformable alignment (MGDA) module with second-order bidirectional propagation is introduced to align the adjacent cine MRI frames by maximizing spatial-temporal information to alleviate motion artifacts. Finally, the multi-resolution fusion (MRF) module is designed to correct the blur and artifacts generated from alignment operation and obtain the last high-quality reconstructed cardiac images. At an 8$\times$ acceleration rate, the numerical measurements on the ACDC dataset are structural similarity index (SSIM) of 78.40%$\pm$.57%, peak signal-to-noise ratio (PSNR) of 30.46$\pm$1.22dB, and normalized mean squared error (NMSE) of 0.0468$\pm$0.0075. On the ACMRI dataset, the results are SSIM of 87.65%$\pm$4.20%, PSNR of 30.04$\pm$1.18dB, and NMSE of 0.0473$\pm$0.0072. The proposed method exhibits high-quality results with richer details and fewer artifacts for cardiac cine MRI reconstruction on different accelerations.

IVDec 1, 2024
A Semi-Supervised Approach with Error Reflection for Echocardiography Segmentation

Xiaoxiang Han, Yiman Liu, Jiang Shang et al.

Segmenting internal structure from echocardiography is essential for the diagnosis and treatment of various heart diseases. Semi-supervised learning shows its ability in alleviating annotations scarcity. While existing semi-supervised methods have been successful in image segmentation across various medical imaging modalities, few have attempted to design methods specifically addressing the challenges posed by the poor contrast, blurred edge details and noise of echocardiography. These characteristics pose challenges to the generation of high-quality pseudo-labels in semi-supervised segmentation based on Mean Teacher. Inspired by human reflection on erroneous practices, we devise an error reflection strategy for echocardiography semi-supervised segmentation architecture. The process triggers the model to reflect on inaccuracies in unlabeled image segmentation, thereby enhancing the robustness of pseudo-label generation. Specifically, the strategy is divided into two steps. The first step is called reconstruction reflection. The network is tasked with reconstructing authentic proxy images from the semantic masks of unlabeled images and their auxiliary sketches, while maximizing the structural similarity between the original inputs and the proxies. The second step is called guidance correction. Reconstruction error maps decouple unreliable segmentation regions. Then, reliable data that are more likely to occur near high-density areas are leveraged to guide the optimization of unreliable data potentially located around decision boundaries. Additionally, we introduce an effective data augmentation strategy, termed as multi-scale mixing up strategy, to minimize the empirical distribution gap between labeled and unlabeled images and perceive diverse scales of cardiac anatomical structures. Extensive experiments demonstrate the competitiveness of the proposed method.

CVJan 2, 2025
Evidential Calibrated Uncertainty-Guided Interactive Segmentation paradigm for Ultrasound Images

Jiang Shang, Yuanmeng Wu, Xiaoxiang Han et al.

Accurate and robust ultrasound image segmentation is critical for computer-aided diagnostic systems. Nevertheless, the inherent challenges of ultrasound imaging, such as blurry boundaries and speckle noise, often cause traditional segmentation methods to struggle with performance. Despite recent advancements in universal image segmentation, such as the Segment Anything Model, existing interactive segmentation methods still suffer from inefficiency and lack of specialization. These methods rely heavily on extensive accurate manual or random sampling prompts for interaction, necessitating numerous prompts and iterations to reach satisfactory performance. In response to this challenge, we propose the Evidential Uncertainty-Guided Interactive Segmentation (EUGIS), an end-to-end, efficient tiered interactive segmentation paradigm based on evidential uncertainty estimation for ultrasound image segmentation. Specifically, EUGIS harnesses evidence-based uncertainty estimation, grounded in Dempster-Shafer theory and Subjective Logic, to gauge the level of uncertainty in the predictions of model for different regions. By prioritizing sampling the high-uncertainty region, our method can effectively simulate the interactive behavior of well-trained radiologists, enhancing the targeted of sampling while reducing the number of prompts and iterations required.Additionally, we propose a trainable calibration mechanism for uncertainty estimation, which can further optimize the boundary between certainty and uncertainty, thereby enhancing the confidence of uncertainty estimation.