CVSep 12, 2024Code
Lagrange Duality and Compound Multi-Attention Transformer for Semi-Supervised Medical Image SegmentationFuchen Zheng, Quanjun Li, Weixuan Li et al.
Medical image segmentation, a critical application of semantic segmentation in healthcare, has seen significant advancements through specialized computer vision techniques. While deep learning-based medical image segmentation is essential for assisting in medical diagnosis, the lack of diverse training data causes the long-tail problem. Moreover, most previous hybrid CNN-ViT architectures have limited ability to combine various attentions in different layers of the Convolutional Neural Network. To address these issues, we propose a Lagrange Duality Consistency (LDC) Loss, integrated with Boundary-Aware Contrastive Loss, as the overall training objective for semi-supervised learning to mitigate the long-tail problem. Additionally, we introduce CMAformer, a novel network that synergizes the strengths of ResUNet and Transformer. The cross-attention block in CMAformer effectively integrates spatial attention and channel attention for multi-scale feature fusion. Overall, our results indicate that CMAformer, combined with the feature fusion framework and the new consistency loss, demonstrates strong complementarity in semi-supervised learning ensembles. We achieve state-of-the-art results on multiple public medical image datasets. Example code are available at: \url{https://github.com/lzeeorno/Lagrange-Duality-and-CMAformer}.
CVDec 3, 2025Code
HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ SegmentationFuchen Zheng, Xinyi Chen, Weixuan Li et al.
Medical image segmentation is a cornerstone of modern clinical diagnostics. While Vision Transformers that leverage shifted window-based self-attention have established new benchmarks in this field, they are often hampered by a critical limitation: their localized attention mechanism struggles to effectively fuse local details with global context. This deficiency is particularly detrimental to challenging tasks such as the segmentation of microtumors and miniature organs, where both fine-grained boundary definition and broad contextual understanding are paramount. To address this gap, we propose HBFormer, a novel Hybrid-Bridge Transformer architecture. The 'Hybrid' design of HBFormer synergizes a classic U-shaped encoder-decoder framework with a powerful Swin Transformer backbone for robust hierarchical feature extraction. The core innovation lies in its 'Bridge' mechanism, a sophisticated nexus for multi-scale feature integration. This bridge is architecturally embodied by our novel Multi-Scale Feature Fusion (MFF) decoder. Departing from conventional symmetric designs, the MFF decoder is engineered to fuse multi-scale features from the encoder with global contextual information. It achieves this through a synergistic combination of channel and spatial attention modules, which are constructed from a series of dilated and depth-wise convolutions. These components work in concert to create a powerful feature bridge that explicitly captures long-range dependencies and refines object boundaries with exceptional precision. Comprehensive experiments on challenging medical image segmentation datasets, including multi-organ, liver tumor, and bladder tumor benchmarks, demonstrate that HBFormer achieves state-of-the-art results, showcasing its outstanding capabilities in microtumor and miniature organ segmentation. Code and models are available at: https://github.com/lzeeorno/HBFormer.
LGSep 12, 2024
GRE^2-MDCL: Graph Representation Embedding Enhanced via Multidimensional Contrastive LearningKaizhe Fan, Quanjun Li
Graph representation learning has emerged as a powerful tool for preserving graph topology when mapping nodes to vector representations, enabling various downstream tasks such as node classification and community detection. However, most current graph neural network models face the challenge of requiring extensive labeled data, which limits their practical applicability in real-world scenarios where labeled data is scarce. To address this challenge, researchers have explored Graph Contrastive Learning (GCL), which leverages enhanced graph data and contrastive learning techniques. While promising, existing GCL methods often struggle with effectively capturing both local and global graph structures, and balancing the trade-off between nodelevel and graph-level representations. In this work, we propose Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning (GRE2-MDCL). Our model introduces a novel triple network architecture with a multi-head attention GNN as the core. GRE2-MDCL first globally and locally augments the input graph using SVD and LAGNN techniques. It then constructs a multidimensional contrastive loss, incorporating cross-network, cross-view, and neighbor contrast, to optimize the model. Extensive experiments on benchmark datasets Cora, Citeseer, and PubMed demonstrate that GRE2-MDCL achieves state-of-the-art performance, with average accuracies of 82.5%, 72.5%, and 81.6% respectively. Visualizations further show tighter intra-cluster aggregation and clearer inter-cluster boundaries, highlighting the effectiveness of our framework in improving upon baseline GCL models.
CVOct 13, 2025Code
DTEA: Dynamic Topology Weaving and Instability-Driven Entropic Attenuation for Medical Image SegmentationWeixuan Li, Quanjun Li, Guang Yu et al.
In medical image segmentation, skip connections are used to merge global context and reduce the semantic gap between encoder and decoder. Current methods often struggle with limited structural representation and insufficient contextual modeling, affecting generalization in complex clinical scenarios. We propose the DTEA model, featuring a new skip connection framework with the Semantic Topology Reconfiguration (STR) and Entropic Perturbation Gating (EPG) modules. STR reorganizes multi-scale semantic features into a dynamic hypergraph to better model cross-resolution anatomical dependencies, enhancing structural and semantic representation. EPG assesses channel stability after perturbation and filters high-entropy channels to emphasize clinically important regions and improve spatial attention. Extensive experiments on three benchmark datasets show our framework achieves superior segmentation accuracy and better generalization across various clinical settings. The code is available at \href{https://github.com/LWX-Research/DTEA}{https://github.com/LWX-Research/DTEA}.
CVApr 28
TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual MediaFuchen Zheng, Chengpei Xu, Long Ma et al.
Visual state-space models (SSMs) have shown strong potential for medical image segmentation, yet their effectiveness is often limited by two practical issues: axis-biased scan ordering weakens the modeling of oblique and curved structures, and naive multi-branch fusion tends to amplify redundant responses. We present TopoMamba, a topology-aware scan-and-fuse framework for segmenting heterogeneous medical visual media. The method combines a diagonal/anti-diagonal TopoA-Scan branch with the standard Cross-Scan branch to provide complementary structural priors, and introduces ScanCache, a device-aware caching mechanism that amortizes explicit scan-index construction across recurring resolutions. To fuse heterogeneous scan features efficiently, we further propose a lightweight HSIC Gate that regulates branch interaction using a dependence-aware scalar gating rule. We also instantiate a volumetric TopoMamba-3D for practical 3D clinical segmentation. Experiments on Synapse CT, ISIC 2017 dermoscopy, and CVC-ClinicDB endoscopy show that TopoMamba consistently improves segmentation quality over strong CNN, Transformer, and SSM baselines, with particularly clear gains on thin or curved targets such as the pancreas and gallbladder, while maintaining favorable deployment efficiency under dynamic input resolutions. These results suggest that topology-aware scan ordering and lightweight dependence-aware fusion form an effective and practical design for medical multimedia segmentation. The code will be made publicly available.
CVOct 13, 2025
EEMS: Edge-Prompt Enhanced Medical Image Segmentation Based on Learnable Gating MechanismHan Xia, Quanjun Li, Qian Li et al.
Medical image segmentation is vital for diagnosis, treatment planning, and disease monitoring but is challenged by complex factors like ambiguous edges and background noise. We introduce EEMS, a new model for segmentation, combining an Edge-Aware Enhancement Unit (EAEU) and a Multi-scale Prompt Generation Unit (MSPGU). EAEU enhances edge perception via multi-frequency feature extraction, accurately defining boundaries. MSPGU integrates high-level semantic and low-level spatial features using a prompt-guided approach, ensuring precise target localization. The Dual-Source Adaptive Gated Fusion Unit (DAGFU) merges edge features from EAEU with semantic features from MSPGU, enhancing segmentation accuracy and robustness. Tests on datasets like ISIC2018 confirm EEMS's superior performance and reliability as a clinical tool.