Xinkun Wang

CV
h-index39
8papers
14citations
Novelty54%
AI Score50

8 Papers

LGDec 9, 2025
SOFA-FL: Self-Organizing Hierarchical Federated Learning with Adaptive Clustered Data Sharing

Yi Ni, Xinkun Wang, Han Zhang

Federated Learning (FL) faces significant challenges in evolving environments, particularly regarding data heterogeneity and the rigidity of fixed network topologies. To address these issues, this paper proposes \textbf{SOFA-FL} (Self-Organizing Hierarchical Federated Learning with Adaptive Clustered Data Sharing), a novel framework that enables hierarchical federated systems to self-organize and adapt over time. The framework is built upon three core mechanisms: (1) \textbf{Dynamic Multi-branch Agglomerative Clustering (DMAC)}, which constructs an initial efficient hierarchical structure; (2) \textbf{Self-organizing Hierarchical Adaptive Propagation and Evolution (SHAPE)}, which allows the system to dynamically restructure its topology through atomic operations -- grafting, pruning, consolidation, and purification -- to adapt to changes in data distribution; and (3) \textbf{Adaptive Clustered Data Sharing}, which mitigates data heterogeneity by enabling controlled partial data exchange between clients and cluster nodes. By integrating these mechanisms, SOFA-FL effectively captures dynamic relationships among clients and enhances personalization capabilities without relying on predetermined cluster structures.

34.0LGMar 27
Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

Yaoyu Fang, Jiahe Qian, Xinkun Wang et al.

Spatial Transcriptomics (ST) provides spatially resolved gene expression profiles within intact tissue architecture, enabling molecular analysis in histological context. However, the high cost, limited throughput, and restricted data sharing of ST experiments result in severe data scarcity, constraining the development of robust computational models. To address this limitation, we present a Central-to-Local adaptive generative diffusion framework for ST (C2L-ST) that integrates large-scale morphological priors with limited molecular guidance. A global central model is first pretrained on extensive histopathology datasets to learn transferable morphological representations, and institution-specific local models are then adapted through lightweight gene-conditioned modulation using a small number of paired image-gene spots. This strategy enables the synthesis of realistic and molecularly consistent histology patches under data-limited conditions. The generated images exhibit high visual and structural fidelity, reproduce cellular composition, and show strong embedding overlap with real data across multiple organs, reflecting both realism and diversity. When incorporated into downstream training, synthetic image-gene pairs improve gene expression prediction accuracy and spatial coherence, achieving performance comparable to real data while requiring only a fraction of sampled spots. C2L-ST provides a scalable and data-efficient framework for molecular-level data augmentation, offering a domain-adaptive and generalizable approach for integrating histology and transcriptomics in spatial biology and related fields.

CVMar 7, 2024
ProMISe: Promptable Medical Image Segmentation using SAM

Jinfeng Wang, Sifan Song, Xinkun Wang et al.

With the proposal of the Segment Anything Model (SAM), fine-tuning SAM for medical image segmentation (MIS) has become popular. However, due to the large size of the SAM model and the significant domain gap between natural and medical images, fine-tuning-based strategies are costly with potential risk of instability, feature damage and catastrophic forgetting. Furthermore, some methods of transferring SAM to a domain-specific MIS through fine-tuning strategies disable the model's prompting capability, severely limiting its utilization scenarios. In this paper, we propose an Auto-Prompting Module (APM), which provides SAM-based foundation model with Euclidean adaptive prompts in the target domain. Our experiments demonstrate that such adaptive prompts significantly improve SAM's non-fine-tuned performance in MIS. In addition, we propose a novel non-invasive method called Incremental Pattern Shifting (IPS) to adapt SAM to specific medical domains. Experimental results show that the IPS enables SAM to achieve state-of-the-art or competitive performance in MIS without the need for fine-tuning. By coupling these two methods, we propose ProMISe, an end-to-end non-fine-tuned framework for Promptable Medical Image Segmentation. Our experiments demonstrate that both using our methods individually or in combination achieves satisfactory performance in low-cost pattern shifting, with all of SAM's parameters frozen.

IVJul 29, 2025
ST-DAI: Single-shot 2.5D Spatial Transcriptomics with Intra-Sample Domain Adaptive Imputation for Cost-efficient 3D Reconstruction

Jiahe Qian, Yaoyu Fang, Xinkun Wang et al.

For 3D spatial transcriptomics (ST), the high per-section acquisition cost of fully sampling every tissue section remains a significant challenge. Although recent approaches predict gene expression from histology images, these methods require large external datasets, which leads to high-cost and suffers from substantial domain discrepancies that lead to poor generalization on new samples. In this work, we introduce ST-DAI, a single-shot framework for 3D ST that couples a cost-efficient 2.5D sampling scheme with an intra-sample domain-adaptive imputation framework. First, in the cost-efficient 2.5D sampling stage, one reference section (central section) is fully sampled while other sections (adjacent sections) is sparsely sampled, thereby capturing volumetric context at significantly reduced experimental cost. Second, we propose a single-shot 3D imputation learning method that allows us to generate fully sampled 3D ST from this cost-efficient 2.5D ST scheme, using only sample-specific training. We observe position misalignment and domain discrepancy between sections. To address those issues, we adopt a pipeline that first aligns the central section to the adjacent section, thereafter generates dense pseudo-supervision on the central section, and then performs Fast Multi-Domain Refinement (FMDR), which adapts the network to the domain of the adjacent section while fine-tuning only a few parameters through the use of Parameter-Efficient Domain-Alignment Layers (PDLs). During this refinement, a Confidence Score Generator (CSG) reweights the pseudo-labels according to their estimated reliability, thereby directing imputation toward trustworthy regions. Our experimental results demonstrate that ST-DAI achieves gene expression prediction performance comparable to fully sampled approaches while substantially reducing the measurement burden.

CVJul 22, 2025
Sparser2Sparse: Single-shot Sparser-to-Sparse Learning for Spatial Transcriptomics Imputation with Natural Image Co-learning

Yaoyu Fang, Jiahe Qian, Xinkun Wang et al.

Spatial transcriptomics (ST) has revolutionized biomedical research by enabling high resolution gene expression profiling within tissues. However, the high cost and scarcity of high resolution ST data remain significant challenges. We present Single-shot Sparser-to-Sparse (S2S-ST), a novel framework for accurate ST imputation that requires only a single and low-cost sparsely sampled ST dataset alongside widely available natural images for co-training. Our approach integrates three key innovations: (1) a sparser-to-sparse self-supervised learning strategy that leverages intrinsic spatial patterns in ST data, (2) cross-domain co-learning with natural images to enhance feature representation, and (3) a Cascaded Data Consistent Imputation Network (CDCIN) that iteratively refines predictions while preserving sampled gene data fidelity. Extensive experiments on diverse tissue types, including breast cancer, liver, and lymphoid tissue, demonstrate that our method outperforms state-of-the-art approaches in imputation accuracy. By enabling robust ST reconstruction from sparse inputs, our framework significantly reduces reliance on costly high resolution data, facilitating potential broader adoption in biomedical research and clinical applications.

CVMar 7, 2025
Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

Xinkun Wang, Yifang Wang, Senwei Liang et al.

This paper discusses how ophthalmologists often rely on multimodal data to improve diagnostic accuracy. However, complete multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy. Traditional deep learning methods typically address these issues by learning representations in latent space. However, the paper highlights two key limitations of these approaches: (i) Task-irrelevant redundant information (e.g., numerous slices) in complex modalities leads to significant redundancy in latent space representations. (ii) Overlapping multimodal representations make it difficult to extract unique features for each modality. To overcome these challenges, the authors propose the Essence-Point and Disentangle Representation Learning (EDRL) strategy, which integrates a self-distillation mechanism into an end-to-end framework to enhance feature selection and disentanglement for more robust multimodal learning. Specifically, the Essence-Point Representation Learning module selects discriminative features that improve disease grading performance. The Disentangled Representation Learning module separates multimodal data into modality-common and modality-unique representations, reducing feature entanglement and enhancing both robustness and interpretability in ophthalmic disease diagnosis. Experiments on multimodal ophthalmology datasets show that the proposed EDRL strategy significantly outperforms current state-of-the-art methods.

CVNov 17, 2025
HiFusion: Hierarchical Intra-Spot Alignment and Regional Context Fusion for Spatial Gene Expression Prediction from Histopathology

Ziqiao Weng, Yaoyu Fang, Jiahe Qian et al.

Spatial transcriptomics (ST) bridges gene expression and tissue morphology but faces clinical adoption barriers due to technical complexity and prohibitive costs. While computational methods predict gene expression from H&E-stained whole-slide images (WSIs), existing approaches often fail to capture the intricate biological heterogeneity within spots and are susceptible to morphological noise when integrating contextual information from surrounding tissue. To overcome these limitations, we propose HiFusion, a novel deep learning framework that integrates two complementary components. First, we introduce the Hierarchical Intra-Spot Modeling module that extracts fine-grained morphological representations through multi-resolution sub-patch decomposition, guided by a feature alignment loss to ensure semantic consistency across scales. Concurrently, we present the Context-aware Cross-scale Fusion module, which employs cross-attention to selectively incorporate biologically relevant regional context, thereby enhancing representational capacity. This architecture enables comprehensive modeling of both cellular-level features and tissue microenvironmental cues, which are essential for accurate gene expression prediction. Extensive experiments on two benchmark ST datasets demonstrate that HiFusion achieves state-of-the-art performance across both 2D slide-wise cross-validation and more challenging 3D sample-specific scenarios. These results underscore HiFusion's potential as a robust, accurate, and scalable solution for ST inference from routine histopathology.

CVSep 21, 2025
Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning

Jiahe Qian, Yaoyu Fang, Ziqiao Weng et al.

Spatial transcriptomics aims to connect high-resolution histology images with spatially resolved gene expression. To achieve better performance on downstream tasks such as gene expression prediction, large-scale pre-training is required to obtain generalisable representations that can bridge histology and transcriptomics across tissues, protocols, and laboratories. Existing cross-modal pre-training approaches for spatial transcriptomics rely on either gene names or expression values in isolation, which strips the gene branch of essential semantics and breaks the association between each gene and its quantitative magnitude. In addition, by restricting supervision to image-text alignment, these methods ignore intrinsic visual cues that are critical for learning robust image features. We present CoMTIP, the first Contrastive Masked Text-Image Pretraining framework that jointly learns from images, gene names, and expression values while capturing fine-grained visual context for spatial transcriptomics. The vision branch uses Masked Feature Modeling to reconstruct occluded patches and learn context-aware image embeddings. The text branch applies a scalable Gene-Text Encoder that processes all gene sentences in parallel, enriches each gene and its numerical value with dedicated embeddings, and employs Pair-aware Adversarial Training (PAAT) to preserve correct gene-value associations. Image and text representations are aligned in a shared InfoNCE-optimised space. Experiments on public spatial transcriptomics datasets show that CoMTIP not only surpasses previous methods on diverse downstream tasks but also achieves zero-shot gene expression prediction, a capability that existing approaches do not provide.