Wenting Duan

CV
h-index11
15papers
513citations
Novelty52%
AI Score61

15 Papers

IVAug 21, 2024Code
NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

Zhenye Lou, Qing Xu, Zekun Jiang et al.

Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, the original SAM presents limited adaptation to medical images. Moreover, SAM requires providing manual bounding box prompts for each object to produce satisfactory segmentation masks, so it is laborious in nuclei segmentation scenarios. To address these limitations, we propose a domain-generalizable framework for nuclei image segmentation, abbreviated to NuSegDG. Specifically, we first devise a Heterogeneous Space Adapter (HS-Adapter) to learn multi-dimensional feature representations of different nuclei domains by injecting a small number of trainable parameters into the image encoder of SAM. To alleviate the labor-intensive requirement of manual prompts, we introduce a Gaussian-Kernel Prompt Encoder (GKP-Encoder) to generate density maps driven by a single point, which guides segmentation predictions by mixing position prompts and semantic prompts. Furthermore, we present a Two-Stage Mask Decoder (TSM-Decoder) to effectively convert semantic masks to instance maps without the manual demand for morphological shape refinement. Based on our experimental evaluations, the proposed NuSegDG demonstrates state-of-the-art performance in nuclei instance segmentation, exhibiting superior domain generalization capabilities. The source code is available at https://github.com/xq141839/NuSegDG.

IVAug 23, 2023Code
SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation

Qing Xu, Wenwei Kuang, Zeyu Zhang et al.

Image segmentation plays an essential role in nuclei image analysis. Recently, the segment anything model has made a significant breakthrough in such tasks. However, the current model exists two major issues for cell segmentation: (1) the image encoder of the segment anything model involves a large number of parameters. Retraining or even fine-tuning the model still requires expensive computational resources. (2) in point prompt mode, points are sampled from the center of the ground truth and more than one set of points is expected to achieve reliable performance, which is not efficient for practical applications. In this paper, a single-point prompt network is proposed for nuclei image segmentation, called SPPNet. We replace the original image encoder with a lightweight vision transformer. Also, an effective convolutional block is added in parallel to extract the low-level semantic information from the image and compensate for the performance degradation due to the small image encoder. We propose a new point-sampling method based on the Gaussian kernel. The proposed model is evaluated on the MoNuSeg-2018 dataset. The result demonstrated that SPPNet outperforms existing U-shape architectures and shows faster convergence in training. Compared to the segment anything model, SPPNet shows roughly 20 times faster inference, with 1/70 parameters and computational cost. Particularly, only one set of points is required in both the training and inference phases, which is more reasonable for clinical applications. The code for our work and more technical details can be found at https://github.com/xq141839/SPPNet.

IVJul 19, 2024Code
De-LightSAM: Modality-Decoupled Lightweight SAM for Generalizable Medical Segmentation

Qing Xu, Jiaxuan Li, Xiangjian He et al.

The universality of deep neural networks across different modalities and their generalization capabilities to unseen domains play an essential role in medical image segmentation. The recent segment anything model (SAM) has demonstrated strong adaptability across diverse natural scenarios. However, the huge computational costs, demand for manual annotations as prompts and conflict-prone decoding process of SAM degrade its generalization capabilities in medical scenarios. To address these limitations, we propose a modality-decoupled lightweight SAM for domain-generalized medical image segmentation, named De-LightSAM. Specifically, we first devise a lightweight domain-controllable image encoder (DC-Encoder) that produces discriminative visual features for diverse modalities. Further, we introduce the self-patch prompt generator (SP-Generator) to automatically generate high-quality dense prompt embeddings for guiding segmentation decoding. Finally, we design the query-decoupled modality decoder (QM-Decoder) that leverages a one-to-one strategy to provide an independent decoding channel for every modality, preventing mutual knowledge interference of different modalities. Moreover, we design a multi-modal decoupled knowledge distillation (MDKD) strategy to leverage robust common knowledge to complement domain-specific medical feature representations. Extensive experiments indicate that De-LightSAM outperforms state-of-the-arts in diverse medical imaging segmentation tasks, displaying superior modality universality and generalization capabilities. Especially, De-LightSAM uses only 2.0% parameters compared to SAM-H. The source code is available at https://github.com/xq141839/De-LightSAM.

IVJun 23, 2023Code
DualAttNet: Synergistic Fusion of Image-level and Fine-Grained Disease Attention for Multi-Label Lesion Detection in Chest X-rays

Qing Xu, Wenting Duan

Chest radiographs are the most commonly performed radiological examinations for lesion detection. Recent advances in deep learning have led to encouraging results in various thoracic disease detection tasks. Particularly, the architecture with feature pyramid network performs the ability to recognise targets with different sizes. However, such networks are difficult to focus on lesion regions in chest X-rays due to their high resemblance in vision. In this paper, we propose a dual attention supervised module for multi-label lesion detection in chest radiographs, named DualAttNet. It efficiently fuses global and local lesion classification information based on an image-level attention block and a fine-grained disease attention algorithm. A binary cross entropy loss function is used to calculate the difference between the attention map and ground truth at image level. The generated gradient flow is leveraged to refine pyramid representations and highlight lesion-related features. We evaluate the proposed model on VinDr-CXR, ChestX-ray8 and COVID-19 datasets. The experimental results show that DualAttNet surpasses baselines by 0.6% to 2.7% mAP and 1.4% to 4.7% AP50 with different detection architectures. The code for our work and more technical details can be found at https://github.com/xq141839/DualAttNet.

CVDec 9, 2025Code
LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training

Qing Xu, Kun Yuan, Yuxiang Luo et al.

Surgical segmentation is pivotal for scene understanding yet remains hindered by annotation scarcity and semantic inconsistency across diverse procedures. Existing approaches typically fine-tune natural foundation models (e.g., SAM) with limited supervision, functioning merely as domain adapters rather than surgical foundation models. Consequently, they struggle to generalize across the vast variability of surgical targets. To bridge this gap, we present LapFM, a foundation model designed to evolve robust segmentation capabilities from massive unlabeled surgical images. Distinct from medical foundation models relying on inefficient self-supervised proxy tasks, LapFM leverages a Hierarchical Concept Evolving Pre-training paradigm. First, we establish a Laparoscopic Concept Hierarchy (LCH) via a hierarchical mask decoder with parent-child query embeddings, unifying diverse entities (i.e., Anatomy, Tissue, and Instrument) into a scalable knowledge structure with cross-granularity semantic consistency. Second, we propose a Confidence-driven Evolving Labeling that iteratively generates and filters pseudo-labels based on hierarchical consistency, progressively incorporating reliable samples from unlabeled images into training. This process yields LapBench-114K, a large-scale benchmark comprising 114K image-mask pairs. Extensive experiments demonstrate that LapFM significantly outperforms state-of-the-art methods, establishing new standards for granularity-adaptive generalization in universal laparoscopic segmentation. The source code is available at https://github.com/xq141839/LapFM.

CVDec 12, 2025Code
FreqDINO: Frequency-Guided Adaptation for Generalized Boundary-Aware Ultrasound Image Segmentation

Yixuan Zhang, Qing Xu, Yue Li et al.

Ultrasound image segmentation is pivotal for clinical diagnosis, yet challenged by speckle noise and imaging artifacts. Recently, DINOv3 has shown remarkable promise in medical image segmentation with its powerful representation capabilities. However, DINOv3, pre-trained on natural images, lacks sensitivity to ultrasound-specific boundary degradation. To address this limitation, we propose FreqDINO, a frequency-guided segmentation framework that enhances boundary perception and structural consistency. Specifically, we devise a Multi-scale Frequency Extraction and Alignment (MFEA) strategy to separate low-frequency structures and multi-scale high-frequency boundary details, and align them via learnable attention. We also introduce a Frequency-Guided Boundary Refinement (FGBR) module that extracts boundary prototypes from high-frequency components and refines spatial features. Furthermore, we design a Multi-task Boundary-Guided Decoder (MBGD) to ensure spatial coherence between boundary and semantic predictions. Extensive experiments demonstrate that FreqDINO surpasses state-of-the-art methods with superior achieves remarkable generalization capability. The code is at https://github.com/MingLang-FD/FreqDINO.

CVJan 1Code
RoLID-11K: A Dashcam Dataset for Small-Object Roadside Litter Detection

Tao Wu, Qing Xu, Xiangjian He et al.

Roadside litter poses environmental, safety and economic challenges, yet current monitoring relies on labour-intensive surveys and public reporting, providing limited spatial coverage. Existing vision datasets for litter detection focus on street-level still images, aerial scenes or aquatic environments, and do not reflect the unique characteristics of dashcam footage, where litter appears extremely small, sparse and embedded in cluttered road-verge backgrounds. We introduce RoLID-11K, the first large-scale dataset for roadside litter detection from dashcams, comprising over 11k annotated images spanning diverse UK driving conditions and exhibiting pronounced long-tail and small-object distributions. We benchmark a broad spectrum of modern detectors, from accuracy-oriented transformer architectures to real-time YOLO models, and analyse their strengths and limitations on this challenging task. Our results show that while CO-DETR and related transformers achieve the best localisation accuracy, real-time models remain constrained by coarse feature hierarchies. RoLID-11K establishes a challenging benchmark for extreme small-object detection in dynamic driving scenes and aims to support the development of scalable, low-cost systems for roadside-litter monitoring. The dataset is available at https://github.com/xq141839/RoLID-11K.

CVNov 15, 2025Code
TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation

Yaxuan Jiao, Qing Xu, Yuxiang Luo et al.

Medical image segmentation is essential for clinical diagnosis and treatment planning. Although transformer-based methods have achieved remarkable results, their high computational cost hinders clinical deployment. To address this issue, we propose TM-UNet, a novel lightweight framework that integrates token sequence modeling with an efficient memory mechanism for efficient medical segmentation. Specifically, we introduce a multi-scale token-memory (MSTM) block that transforms 2D spatial features into token sequences through strategic spatial scanning, leveraging matrix memory cells to selectively retain and propagate discriminative contextual information across tokens. This novel token-memory mechanism acts as a dynamic knowledge store that captures long-range dependencies with linear complexity, enabling efficient global reasoning without redundant computation. Our MSTM block further incorporates exponential gating to identify token effectiveness and multi-scale contextual extraction via parallel pooling operations, enabling hierarchical representation learning without computational overhead. Extensive experiments demonstrate that TM-UNet outperforms state-of-the-art methods across diverse medical segmentation tasks with substantially reduced computation cost. The code is available at https://github.com/xq141839/TM-UNet.

CVJun 20, 2025Code
Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation

Qing Xu, Yuxiang Luo, Wenting Duan et al.

Medical image analysis is critical yet challenged by the need of jointly segmenting organs or tissues, and numerous instances for anatomical structures and tumor microenvironment analysis. Existing studies typically formulated different segmentation tasks in isolation, which overlooks the fundamental interdependencies between these tasks, leading to suboptimal segmentation performance and insufficient medical image understanding. To address this issue, we propose a Co-Seg++ framework for versatile medical segmentation. Specifically, we introduce a novel co-segmentation paradigm, allowing semantic and instance segmentation tasks to mutually enhance each other. We first devise a spatio-temporal prompt encoder (STP-Encoder) to capture long-range spatial and temporal relationships between segmentation regions and image embeddings as prior spatial constraints. Moreover, we devise a multi-task collaborative decoder (MTC-Decoder) that leverages cross-guidance to strengthen the contextual consistency of both tasks, jointly computing semantic and instance segmentation masks. Extensive experiments on diverse CT and histopathology datasets demonstrate that the proposed Co-Seg++ outperforms state-of-the-arts in the semantic, instance, and panoptic segmentation of dental anatomical structures, histopathology tissues, and nuclei instances. The source code is available at https://github.com/xq141839/Co-Seg-Plus.

CVDec 4, 2025
SP-Det: Self-Prompted Dual-Text Fusion for Generalized Multi-Label Lesion Detection

Qing Xu, Yanqian Wang, Xiangjian Hea et al.

Automated lesion detection in chest X-rays has demonstrated significant potential for improving clinical diagnosis by precisely localizing pathological abnormalities. While recent promptable detection frameworks have achieved remarkable accuracy in target localization, existing methods typically rely on manual annotations as prompts, which are labor-intensive and impractical for clinical applications. To address this limitation, we propose SP-Det, a novel self-prompted detection framework that automatically generates rich textual context to guide multi-label lesion detection without requiring expert annotations. Specifically, we introduce an expert-free dual-text prompt generator (DTPG) that leverages two complementary textual modalities: semantic context prompts that capture global pathological patterns and disease beacon prompts that focus on disease-specific manifestations. Moreover, we devise a bidirectional feature enhancer (BFE) that synergistically integrates comprehensive diagnostic context with disease-specific embeddings to significantly improve feature representation and detection accuracy. Extensive experiments on two chest X-ray datasets with diverse thoracic disease categories demonstrate that our SP-Det framework outperforms state-of-the-art detection methods while completely eliminating the dependency on expert-annotated prompts compared to existing promptable architectures.

CVSep 8, 2025Code
Co-Seg: Mutual Prompt-Guided Collaborative Learning for Tissue and Nuclei Segmentation

Qing Xu, Wenting Duan, Zhen Chen

Histopathology image analysis is critical yet challenged by the demand of segmenting tissue regions and nuclei instances for tumor microenvironment and cellular morphology analysis. Existing studies focused on tissue semantic segmentation or nuclei instance segmentation separately, but ignored the inherent relationship between these two tasks, resulting in insufficient histopathology understanding. To address this issue, we propose a Co-Seg framework for collaborative tissue and nuclei segmentation. Specifically, we introduce a novel co-segmentation paradigm, allowing tissue and nuclei segmentation tasks to mutually enhance each other. To this end, we first devise a region-aware prompt encoder (RP-Encoder) to provide high-quality semantic and instance region prompts as prior constraints. Moreover, we design a mutual prompt mask decoder (MP-Decoder) that leverages cross-guidance to strengthen the contextual consistency of both tasks, collaboratively computing semantic and instance segmentation masks. Extensive experiments on the PUMA dataset demonstrate that the proposed Co-Seg surpasses state-of-the-arts in the semantic, instance and panoptic segmentation of tumor tissues and nuclei instances. The source code is available at https://github.com/xq141839/Co-Seg.

CVOct 12, 2025Code
MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation

Yuxiang Luo, Qing Xu, Hai Huang et al.

Multi-modal brain tumor segmentation is critical for clinical diagnosis, and it requires accurate identification of distinct internal anatomical subregions. While the recent prompt-based segmentation paradigms enable interactive experiences for clinicians, existing methods ignore cross-modal correlations and rely on labor-intensive category-specific prompts, limiting their applicability in real-world scenarios. To address these issues, we propose a MSM-Seg framework for multi-modal brain tumor segmentation. The MSM-Seg introduces a novel dual-memory segmentation paradigm that synergistically integrates multi-modal and inter-slice information with the efficient category-agnostic prompt for brain tumor understanding. To this end, we first devise a modality-and-slice memory attention (MSMA) to exploit the cross-modal and inter-slice relationships among the input scans. Then, we propose a multi-scale category-agnostic prompt encoder (MCP-Encoder) to provide tumor region guidance for decoding. Moreover, we devise a modality-adaptive fusion decoder (MF-Decoder) that leverages the complementary decoding information across different modalities to improve segmentation accuracy. Extensive experiments on different MRI datasets demonstrate that our MSM-Seg framework outperforms state-of-the-art methods in multi-modal metastases and glioma tumor segmentation. The code is available at https://github.com/xq141839/MSM-Seg.

IVApr 8, 2025Code
HER-Seg: Holistically Efficient Segmentation for High-Resolution Medical Images

Qing Xu, Zhenye Lou, Chenxin Li et al.

High-resolution segmentation is critical for precise disease diagnosis by extracting fine-grained morphological details. Existing hierarchical encoder-decoder frameworks have demonstrated remarkable adaptability across diverse medical segmentation tasks. While beneficial, they usually require the huge computation and memory cost when handling large-size segmentation, which limits their applications in foundation model building and real-world clinical scenarios. To address this limitation, we propose a holistically efficient framework for high-resolution medical image segmentation, called HER-Seg. Specifically, we first devise a computation-efficient image encoder (CE-Encoder) to model long-range dependencies with linear complexity while maintaining sufficient representations. In particular, we introduce the dual-gated linear attention (DLA) mechanism to perform cascaded token filtering, selectively retaining important tokens while ignoring irrelevant ones to enhance attention computation efficiency. Then, we introduce a memory-efficient mask decoder (ME-Decoder) to eliminate the demand for the hierarchical structure by leveraging cross-scale segmentation decoding. Extensive experiments reveal that HER-Seg outperforms state-of-the-arts in high-resolution medical 2D, 3D and video segmentation tasks. In particular, our HER-Seg requires only 0.59GB training GPU memory and 9.39G inference FLOPs per 1024$\times$1024 image, demonstrating superior memory and computation efficiency. The code is available at https://github.com/xq141839/HER-Seg.

IVFeb 2, 2022Code
DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

Qing Xu, Zhicheng Ma, Na HE et al.

Deep learning architecture with convolutional neural network (CNN) achieves outstanding success in the field of computer vision. Where U-Net, an encoder-decoder architecture structured by CNN, makes a great breakthrough in biomedical image segmentation and has been applied in a wide range of practical scenarios. However, the equal design of every downsampling layer in the encoder part and simply stacked convolutions do not allow U-Net to extract sufficient information of features from different depths. The increasing complexity of medical images brings new challenges to the existing methods. In this paper, we propose a deeper and more compact split-attention u-shape network (DCSAU-Net), which efficiently utilises low-level and high-level semantic information based on two novel frameworks: primary feature conservation and compact split-attention block. We evaluate the proposed model on CVC-ClinicDB, 2018 Data Science Bowl, ISIC-2018 and SegPC-2021 datasets. As a result, DCSAU-Net displays better performance than other state-of-the-art (SOTA) methods in terms of the mean Intersection over Union (mIoU) and F1-socre. More significantly, the proposed model demonstrates excellent segmentation performance on challenging images. The code for our work and more technical details can be found at https://github.com/xq141839/DCSAU-Net.

IVNov 4, 2020Code
DR-Unet104 for Multimodal MRI brain tumor segmentation

Jordan Colman, Lei Zhang, Wenting Duan et al.

In this paper we propose a 2D deep residual Unet with 104 convolutional layers (DR-Unet104) for lesion segmentation in brain MRIs. We make multiple additions to the Unet architecture, including adding the 'bottleneck' residual block to the Unet encoder and adding dropout after each convolution block stack. We verified the effect of introducing the regularisation of dropout with small rate (e.g. 0.2) on the architecture, and found a dropout of 0.2 improved the overall performance compared to no dropout, or a dropout of 0.5. We evaluated the proposed architecture as part of the Multimodal Brain Tumor Segmentation (BraTS) 2020 Challenge and compared our method to DeepLabV3+ with a ResNet-V2-152 backbone. We found that the DR-Unet104 achieved a mean dice score coefficient of 0.8862, 0.6756 and 0.6721 for validation data, whole tumor, enhancing tumor and tumor core respectively, an overall improvement on 0.8770, 0.65242 and 0.68134 achieved by DeepLabV3+. Our method produced a final mean DSC of 0.8673, 0.7514 and 0.7983 on whole tumor, enhancing tumor and tumor core on the challenge's testing data. We produced a competitive lesion segmentation architecture, despite only 2D convolutions, having the added benefit that it can be used on lower power computers than a 3D architecture. The source code and trained model for this work is openly available at https://github.com/jordan-colman/DR-Unet104.