Zixin Hu

CV
h-index18
7papers
11citations
Novelty49%
AI Score52

7 Papers

QMMar 13, 2023
CoGANPPIS: A Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction

Jiaxing Guo, Xuening Zhu, Zixin Hu et al.

Protein-protein interactions are of great importance in biochemical processes. Accurate prediction of protein-protein interaction sites (PPIs) is crucial for our understanding of biological mechanism. Although numerous approaches have been developed recently and achieved gratifying results, there are still two limitations: (1) Most existing models have excavated a number of useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, which may limit the model's prediction performance since some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. Specifically, CoGANPPIS utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features as the local feature representation; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Extensive experiments on two benchmark datasets have been conducted, demonstrating that our proposed model achieves the state-of-the-art performance.

CVMar 12
Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Yichi Zhang, Le Xue, Wenbo Zhang et al.

Positron emission tomography (PET) is a key nuclear medicine imaging modality that visualizes radiotracer distributions to quantify in vivo physiological and metabolic processes, playing an irreplaceable role in disease management. Despite its clinical importance, the development of deep learning models for quantitative PET image analysis remains severely limited, driven by both the inherent segmentation challenge from PET's paucity of anatomical contrast and the high costs of data acquisition and annotation. To bridge this gap, we develop generalist foundational models for universal segmentation from 3D whole-body PET imaging. We first build the largest and most comprehensive PET dataset to date, comprising 11041 3D whole-body PET scans with 59831 segmentation masks for model development. Based on this dataset, we present SegAnyPET, an innovative foundational model with general-purpose applicability to diverse segmentation tasks. Built on a 3D architecture with a prompt engineering strategy for mask generation, SegAnyPET enables universal and scalable organ and lesion segmentation, supports efficient human correction with minimal effort, and enables a clinical human-in-the-loop workflow. Extensive evaluations on multi-center, multi-tracer, multi-disease datasets demonstrate that SegAnyPET achieves strong zero-shot performance across a wide range of segmentation tasks, highlighting its potential to advance the clinical applications of molecular imaging.

CVApr 27
SemiSAM-O1: How far can we push the boundary of annotation-efficient medical image segmentation?

Yichi Zhang, Le Xue, Bichun Xu et al.

Semi-supervised learning (SSL) has become a promising solution to alleviate the annotation burden of deep learning-based medical image segmentation models. While recent advances in foundation model-driven SSL have pushed the boundary to extremely limited annotation scenarios, they fail to maintain robust competitive performance in complex imaging modalities. In this paper, we propose SemiSAM-O1, an annotation-efficient framework using only one annotated template image for segmentation. SemiSAM-O1 extends the specialist-generalist collaborative learning framework to the extreme one-label setting by fully exploiting the foundation model's feature representation capability beyond its prompting interface. SemiSAM-O1 operates in two stages. In the first stage, the foundation model's encoder extracts dense features from all volumes, and class prototypes derived from the single annotated template are propagated to the unlabeled pool via feature similarity to produce coarse initial pseudo-labels. In the second stage, an iterative training-and-refinement loop progressively improves both the segmentation model and the pseudo-labels over multiple rounds, where each round trains the model from scratch on current pseudo-labels and generates updated predictions with voxel-wise uncertainty estimates. An uncertainty-guided refinement step further leverages the foundation model's global feature space to correct high-uncertainty regions by aggregating labels from their most similar confident neighbors, establishing a virtuous cycle of mutual improvement. Extensive experiments on a wide range of segmentation tasks across different modalities and anatomical targets demonstrate that SemiSAM-O1 significantly narrows the performance gap between one-label semi-supervised learning and full supervision, while significantly reducing the computational overhead of online foundation model inference.

LGJan 21
FSX: Message Flow Sensitivity Enhanced Structural Explainer for Graph Neural Networks

Bizu Feng, Zhimu Yang, Shaode Yu et al.

Despite the widespread success of Graph Neural Networks (GNNs), understanding the reasons behind their specific predictions remains challenging. Existing explainability methods face a trade-off that gradient-based approaches are computationally efficient but often ignore structural interactions, while game-theoretic techniques capture interactions at the cost of high computational overhead and potential deviation from the model's true reasoning path. To address this gap, we propose FSX (Message Flow Sensitivity Enhanced Structural Explainer), a novel hybrid framework that synergistically combines the internal message flows of the model with a cooperative game approach applied to the external graph data. FSX first identifies critical message flows via a novel flow-sensitivity analysis: during a single forward pass, it simulates localized node perturbations and measures the resulting changes in message flow intensities. These sensitivity-ranked flows are then projected onto the input graph to define compact, semantically meaningful subgraphs. Within each subgraph, a flow-aware cooperative game is conducted, where node contributions are evaluated fairly through a Shapley-like value that incorporates both node-feature importance and their roles in sustaining or destabilizing the identified critical flows. Extensive evaluation across multiple datasets and GNN architectures demonstrates that FSX achieves superior explanation fidelity with significantly reduced runtime, while providing unprecedented insights into the structural logic underlying model predictions--specifically, how important sub-structures exert influence by governing the stability of key internal computational pathways.

AO-PHDec 15, 2023
FuXi-S2S: A machine learning model that outperforms conventional global subseasonal forecast models

Lei Chen, Xiaohui Zhong, Hao Li et al.

Skillful subseasonal forecasts are crucial for various sectors of society but pose a grand scientific challenge. Recently, machine learning based weather forecasting models outperform the most successful numerical weather predictions generated by the European Centre for Medium-Range Weather Forecasts (ECMWF), but have not yet surpassed conventional models at subseasonal timescales. This paper introduces FuXi Subseasonal-to-Seasonal (FuXi-S2S), a machine learning model that provides global daily mean forecasts up to 42 days, encompassing five upper-air atmospheric variables at 13 pressure levels and 11 surface variables. FuXi-S2S, trained on 72 years of daily statistics from ECMWF ERA5 reanalysis data, outperforms the ECMWF's state-of-the-art Subseasonal-to-Seasonal model in ensemble mean and ensemble forecasts for total precipitation and outgoing longwave radiation, notably enhancing global precipitation forecast. The improved performance of FuXi-S2S can be primarily attributed to its superior capability to capture forecast uncertainty and accurately predict the Madden-Julian Oscillation (MJO), extending the skillful MJO prediction from 30 days to 36 days. Moreover, FuXi-S2S not only captures realistic teleconnections associated with the MJO, but also emerges as a valuable tool for discovering precursor signals, offering researchers insights and potentially establishing a new paradigm in Earth system science research.

IVAug 6, 2025
PET2Rep: Towards Vision-Language Model-Drived Automated Radiology Report Generation for Positron Emission Tomography

Yichi Zhang, Wenbo Zhang, Zehui Ling et al.

Positron emission tomography (PET) is a cornerstone of modern oncologic and neurologic imaging, distinguished by its unique ability to illuminate dynamic metabolic processes that transcend the anatomical focus of traditional imaging technologies. Radiology reports are essential for clinical decision making, yet their manual creation is labor-intensive and time-consuming. Recent advancements of vision-language models (VLMs) have shown strong potential in medical applications, presenting a promising avenue for automating report generation. However, existing applications of VLMs in the medical domain have predominantly focused on structural imaging modalities, while the unique characteristics of molecular PET imaging have largely been overlooked. To bridge the gap, we introduce PET2Rep, a large-scale comprehensive benchmark for evaluation of general and medical VLMs for radiology report generation for PET images. PET2Rep stands out as the first dedicated dataset for PET report generation with metabolic information, uniquely capturing whole-body image-report pairs that cover dozens of organs to fill the critical gap in existing benchmarks and mirror real-world clinical comprehensiveness. In addition to widely recognized natural language generation metrics, we introduce a series of clinical efficacy metrics to evaluate the quality of radiotracer uptake pattern description in key organs in generated reports. We conduct a head-to-head comparison of 30 cutting-edge general-purpose and medical-specialized VLMs. The results show that the current state-of-the-art VLMs perform poorly on PET report generation task, falling considerably short of fulfilling practical needs. Moreover, we identify several key insufficiency that need to be addressed to advance the development in medical applications.

AIJun 14, 2025
Efficient Network Automatic Relevance Determination

Hongwei Zhang, Ziqi Ye, Xinyuan Wang et al.

We propose Network Automatic Relevance Determination (NARD), an extension of ARD for linearly probabilistic models, to simultaneously model sparse relationships between inputs $X \in \mathbb R^{d \times N}$ and outputs $Y \in \mathbb R^{m \times N}$, while capturing the correlation structure among the $Y$. NARD employs a matrix normal prior which contains a sparsity-inducing parameter to identify and discard irrelevant features, thereby promoting sparsity in the model. Algorithmically, it iteratively updates both the precision matrix and the relationship between $Y$ and the refined inputs. To mitigate the computational inefficiencies of the $\mathcal O(m^3 + d^3)$ cost per iteration, we introduce Sequential NARD, which evaluates features sequentially, and a Surrogate Function Method, leveraging an efficient approximation of the marginal likelihood and simplifying the calculation of determinant and inverse of an intermediate matrix. Combining the Sequential update with the Surrogate Function method further reduces computational costs. The computational complexity per iteration for these three methods is reduced to $\mathcal O(m^3+p^3)$, $\mathcal O(m^3 + d^2)$, $\mathcal O(m^3+p^2)$, respectively, where $p \ll d$ is the final number of features in the model. Our methods demonstrate significant improvements in computational efficiency with comparable performance on both synthetic and real-world datasets.