Ruizhi Zhang

CV
h-index20
13papers
39citations
Novelty50%
AI Score52

13 Papers

1.2CVMay 25
Cross-Stage Attention Multi-Expert Network for Radiologist-Inspired Breast Ultrasound Diagnosis

Xinyang Zhai, Chong Yang, Ruizhi Zhang

Breast ultrasound imaging is an important noninvasive method for early breast cancer diagnosis, but automatic benign/malignant classification remains challenging due to tumor heterogeneity, blurred boundaries, and data imbalance. To improve feature representation and classification accuracy, this paper proposes the Cross-Stage Attention Mixture-of-Experts Network (CSA-MoE-Net). It adopts a Cross-Stage Attention-enhanced ResNet-18 as the backbone, in which the Cross-Stage Attention module adaptively recalibrates multi-level features, thereby enhancing key tumor features and suppressing redundancy. A three-branch Mixture of Experts (MoE) Block learns complementary features from the Whole Tumor Image, Tumor Core, and Boundary, and an Adaptive Gating Network fuses them to capture morphological, textural, and contextual information. The fused features are denoted as Fused Expert Feature (FEF) in the architecture. Experiments on a balanced dataset of 2,129 breast ultrasound images show that, averaged over 20 independent runs, the model achieves an accuracy of 96.33\%, precision of 94.09\%, recall of 98.53\%, F1-score of 96.25\%, and AUC of 99.50\%. Compared to the baseline ResNet-18, these metrics improve by 3.01, 0.70, 5.37, 2.98, and 5.42 percentage points, respectively. The proposed mechanism requires no invasive modification and can be seamlessly embedded into VGG-16, DenseNet-121, etc., yielding stable performance gains, thus providing reliable support for computer-aided diagnosis.

SIJun 20, 2022
Temporal Link Prediction via Adjusted Sigmoid Function and 2-Simplex Sructure

Ruizhi Zhang, Qiaozi Wang, Qiming Yang et al.

Temporal network link prediction is an important task in the field of network science, and has a wide range of applications in practical scenarios. Revealing the evolutionary mechanism of the network is essential for link prediction, and how to effectively utilize the historical information for temporal links and efficiently extract the high-order patterns of network structure remains a vital challenge. To address these issues, in this paper, we propose a novel temporal link prediction model with adjusted sigmoid function and 2-simplex structure (TLPSS). The adjusted sigmoid decay mode takes the active, decay and stable states of edges into account, which properly fits the life cycle of information. Moreover, the latent matrix sequence is introduced, which is composed of simplex high-order structure, to enhance the performance of link prediction method since it is highly feasible in sparse network. Combining the life cycle of information and simplex high-order structure, the overall performance of TLPSS is achieved by satisfying the consistency of temporal and structural information in dynamic networks. Experimental results on six real-world datasets demonstrate the effectiveness of TLPSS, and our proposed model improves the performance of link prediction by an average of 15% compared to other baseline methods.

CVJan 15
DW-DGAT: Dynamically Weighted Dual Graph Attention Network for Neurodegenerative Disease Diagnosis

Chengjia Liang, Zhenjiong Wang, Chao Chen et al.

Parkinson's disease (PD) and Alzheimer's disease (AD) are the two most prevalent and incurable neurodegenerative diseases (NDs) worldwide, for which early diagnosis is critical to delay their progression. However, the high dimensionality of multi-metric data with diverse structural forms, the heterogeneity of neuroimaging and phenotypic data, and class imbalance collectively pose significant challenges to early ND diagnosis. To address these challenges, we propose a dynamically weighted dual graph attention network (DW-DGAT) that integrates: (1) a general-purpose data fusion strategy to merge three structural forms of multi-metric data; (2) a dual graph attention architecture based on brain regions and inter-sample relationships to extract both micro- and macro-level features; and (3) a class weight generation mechanism combined with two stable and effective loss functions to mitigate class imbalance. Rigorous experiments, based on the Parkinson Progression Marker Initiative (PPMI) and Alzhermer's Disease Neuroimaging Initiative (ADNI) studies, demonstrate the state-of-the-art performance of our approach.

SIApr 28, 2022
Enhance Ambiguous Community Structure via Multi-strategy Community Related Link Prediction Method with Evolutionary Process

Qiming Yang, Wei Wei, Ruizhi Zhang et al.

Most real-world networks suffer from incompleteness or incorrectness, which is an inherent attribute to real-world datasets. As a consequence, those downstream machine learning tasks in complex network like community detection methods may yield less satisfactory results, i.e., a proper preprocessing measure is required here. To address this issue, in this paper, we design a new community attribute based link prediction strategy HAP and propose a two-step community enhancement algorithm with automatic evolution process based on HAP. This paper aims at providing a community enhancement measure through adding links to clarify ambiguous community structures. The HAP method takes the neighbourhood uncertainty and Shannon entropy to identify boundary nodes, and establishes links by considering the nodes' community attributes and community size at the same time. The experimental results on twelve real-world datasets with ground truth community indicate that the proposed link prediction method outperforms other baseline methods and the enhancement of community follows the expected evolution process.

ROMar 4
Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters

Xinyu Cui, Xuanfa Jin, Xue Yan et al.

Autonomous navigation in congested maritime environments is a critical capability for a wide range of real-world applications. However, it remains an unresolved challenge due to complex vessel interactions and significant environmental uncertainties. Existing methods often fail in practical deployment due to a substantial sim-to-real gap, which stems from imprecise simulation, inadequate situational awareness, and unsafe exploration strategies. To address these, we propose \textbf{Sim2Sea}, a comprehensive framework designed to bridge simulation and real-world execution. Sim2Sea advances in three key aspects. First, we develop a GPU-accelerated parallel simulator for scalable and accurate maritime scenario simulation. Second, we design a dual-stream spatiotemporal policy that handles complex dynamics and multi-modal perception, augmented with a velocity-obstacle-guided action masking mechanism to ensure safe and efficient exploration. Finally, a targeted domain randomization scheme helps bridge the sim-to-real gap. Simulation results demonstrate that our method achieves faster convergence and safer trajectories than established baselines. In addition, our policy trained purely in simulation successfully transfers zero-shot to a 17-ton unmanned vessel operating in real-world congested waters. These results validate the effectiveness of Sim2Sea in achieving reliable sim-to-real transfer for practical autonomous maritime navigation.

ROMar 3
Give me scissors: Collision-Free Dual-Arm Surgical Assistive Robot for Instrument Delivery

Xuejin Luo, Shiquan Sun, Runshi Zhang et al.

During surgery, scrub nurses are required to frequently deliver surgical instruments to surgeons, which can lead to physical fatigue and decreased focus. Robotic scrub nurses provide a promising solution that can replace repetitive tasks and enhance efficiency. Existing research on robotic scrub nurses relies on predefined paths for instrument delivery, which limits their generalizability and poses safety risks in dynamic environments. To address these challenges, we present a collision-free dual-arm surgical assistive robot capable of performing instrument delivery. A vision-language model is utilized to automatically generate the robot's grasping and delivery trajectories in a zero-shot manner based on surgeons' instructions. A real-time obstacle minimum distance perception method is proposed and integrated into a unified quadratic programming framework. This framework ensures reactive obstacle avoidance and self-collision prevention during the dual-arm robot's autonomous movement in dynamic environments. Extensive experimental validations demonstrate that the proposed robotic system achieves an 83.33% success rate in surgical instrument delivery while maintaining smooth, collision-free movement throughout all trials. The project page and source code are available at https://give-me-scissors.github.io/.

28.4STApr 14
Sequential Change Detection for Multiple Data Streams with Differential Privacy

Lixing Zhang, Liyan Xie, Ruizhi Zhang

Sequential change-point detection seeks to rapidly identify distributional changes in streaming data while controlling false alarms. Existing multi-stream detection methods typically rely on non-private access to raw observations or intermediate statistics, limiting their usage in privacy-sensitive settings. We study sequential change-point detection for multiple data streams under differential privacy constraints. We consider multiple independent streams undergoing a synchronized change at an unknown time and in an unknown subset of streams, and propose DP-SUM-CUSUM, a differentially private detection procedure based on the summation of per-stream CUSUM statistics with calibrated Laplace noise injection. We show that DP-SUM-CUSUM satisfies sequential $\varepsilon$-differential privacy and derive bounds on the average run length to false alarm and the worst-case average detection delay, explicitly characterizing the privacy--efficiency tradeoff. A truncation-based extension is also presented to handle distributional shifts with unbounded log-likelihood ratios. Simulations and experiments on an Internet of Things (IoT) botnet dataset validate the proposed approach.

76.7SEMar 14
Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging

Zhilin Liu, Ye Huang, Ting Xie et al.

Recent advances in Large Language Model (LLM)-based agents have shown remarkable progress in code generation. However, current agent methods mainly rely on text-output-based feedback (e.g. command-line outputs) for multi-round debugging and struggle in graphical user interface (GUI) that involve visual information. This is mainly due to two limitations: 1) GUI programs are event-driven, yet existing methods cannot simulate user interactions to trigger GUI element logic 2) GUI programs possess visual attributes, making it difficult for text-based approaches to assess whether the rendered interface meets user needs. To systematically address these challenges, we first introduce InteractGUI Bench, a novel benchmark comprising 984 commonly used real-world desktop GUI application tasks designed for fine-grained evaluation of both interaction logic and visual structure. Furthermore, we propose VF-Coder, a vision-feedback-based multi-agent system for debugging GUI code. By perceiving visual information and directly interacting with program interfaces, VF-Coder can identify potential logic and layout issues in a human-like manner. On InteractGUI Bench, our VF-Coder approach increases the success rate of Gemini-3-Flash from 21.68% to 28.29% and raises the visual score from 0.4284 to 0.5584, indicating the effectiveness of visual feedback in GUI debugging.

64.7CVApr 9
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models

Ruizhi Zhang, Ye Huang, Yuangang Pan et al.

While Vision-Language Models (VLMs) have achieved remarkable progress in static visual understanding, their deployment in complex 3D embodied environments remains severely limited. Existing benchmarks suffer from four critical deficiencies: (1) passive perception tasks circumvent interactive dynamics; (2) simplified 2D environments fail to assess depth perception; (3) privileged state leakage bypasses genuine visual processing; and (4) human evaluation is prohibitively expensive and unscalable. We introduce PokeGym, a visually-driven long-horizon benchmark instantiated within Pokemon Legends: Z-A, a visually complex 3D open-world Role-Playing Game. PokeGym enforces strict code-level isolation: agents operate solely on raw RGB observations while an independent evaluator verifies success via memory scanning, ensuring pure vision-based decision-making and automated, scalable assessment. The benchmark comprises 30 tasks (30-220 steps) spanning navigation, interaction, and mixed scenarios, with three instruction granularities (Visual-Guided, Step-Guided, Goal-Only) to systematically deconstruct visual grounding, semantic reasoning, and autonomous exploration capabilities. Our evaluation reveals a key limitation of current VLMs: physical deadlock recovery, rather than high-level planning, constitutes the primary bottleneck, with deadlocks showing a strong negative correlation with task success. Furthermore, we uncover a metacognitive divergence: weaker models predominantly suffer from Unaware Deadlocks (oblivious to entrapment), whereas advanced models exhibit Aware Deadlocks (recognizing entrapment yet failing to recover). These findings highlight the need to integrate explicit spatial intuition into VLM architectures. The code and benchmark will be available on GitHub.

CEDec 16, 2025
Wearable-informed generative digital avatars predict task-conditioned post-stroke locomotion

Yanning Dai, Chenyu Tang, Ruizhi Zhang et al.

Dynamic prediction of locomotor capacity after stroke could enable more individualized rehabilitation, yet current assessments largely provide static impairment scores and do not indicate whether patients can perform specific tasks such as slope walking or stair climbing. Here, we present a wearable-informed data-physics hybrid generative framework that reconstructs a stroke survivor's locomotor control from wearable inertial sensing and predicts task-conditioned post-stroke locomotion in new environments. From a single 20 m level-ground walking trial recorded by five IMUs, the framework personalizes a physics-based digital avatar using a healthy-motion prior and hybrid imitation learning, generating dynamically feasible, patient-specific movements for inclined walking and stair negotiation. Across 11 stroke inpatients, predicted postures reached 82.2% similarity for slopes and 69.9% for stairs, substantially exceeding a physics-only baseline. In a multicentre pilot randomized study (n = 21; 28 days), access to scenario-specific locomotion predictions to support task selection and difficulty titration was associated with larger gains in Fugl-Meyer lower-extremity scores than standard care (mean change 6.0 vs 3.7 points; $p < 0.05$). These results suggest that wearable-informed generative digital avatars may augment individualized gait rehabilitation planning and provide a pathway toward dynamically personalized post-stroke motor recovery strategies.

LGMar 4, 2025
Artificial Intelligence in Reactor Physics: Current Status and Future Prospects

Ruizhi Zhang, Shengfeng Zhu, Kan Wang et al.

Reactor physics is the study of neutron properties, focusing on using models to examine the interactions between neutrons and materials in nuclear reactors. Artificial intelligence (AI) has made significant contributions to reactor physics, e.g., in operational simulations, safety design, real-time monitoring, core management and maintenance. This paper presents a comprehensive review of AI approaches in reactor physics, especially considering the category of Machine Learning (ML), with the aim of describing the application scenarios, frontier topics, unsolved challenges and future research directions. From equation solving and state parameter prediction to nuclear industry applications, this paper provides a step-by-step overview of ML methods applied to steady-state, transient and combustion problems. Most literature works achieve industry-demanded models by enhancing the efficiency of deterministic methods or correcting uncertainty methods, which leads to successful applications. However, research on ML methods in reactor physics is somewhat fragmented, and the ability to generalize models needs to be strengthened. Progress is still possible, especially in addressing theoretical challenges and enhancing industrial applications such as building surrogate models and digital twins.

HCNov 28, 2024
An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor Impairment

Chenyu Tang, Ruizhi Zhang, Shuo Gao et al.

At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Moreover, the lack of integrated solutions capable of simultaneously monitoring motor recovery and providing intelligent assistance in home environments hampers rehabilitation outcomes. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94\% accuracy, enabling quantitative tracking of walking patterns during daily activities. An optional head-mounted eye-tracking module, together with ambient sensors such as cameras and microphones, supports seamless hands-free control of household devices with a 100\% success rate and sub-second response time. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, ensuring low latency and data privacy. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions -- issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increased mean user satisfaction from 3.9 $\pm$ 0.8 in conventional home environments to 8.4 $\pm$ 0.6 with the full system ($n=20$). Beyond stroke, the system offers a scalable, patient-centered framework with potential for long-term use in broader neurorehabilitation and aging-in-place applications.

MLDec 18, 2020
Adversarially Robust Estimate and Risk Analysis in Linear Regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Adversarially robust learning aims to design algorithms that are robust to small adversarial perturbations on input variables. Beyond the existing studies on the predictive performance to adversarial samples, our goal is to understand statistical properties of adversarially robust estimates and analyze adversarial risk in the setup of linear regression models. By discovering the statistical minimax rate of convergence of adversarially robust estimators, we emphasize the importance of incorporating model information, e.g., sparsity, in adversarially robust learning. Further, we reveal an explicit connection of adversarial and standard estimates, and propose a straightforward two-stage adversarial learning framework, which facilitates to utilize model structure information to improve adversarial robustness. In theory, the consistency of the adversarially robust estimator is proven and its Bahadur representation is also developed for the statistical inference purpose. The proposed estimator converges in a sharp rate under either low-dimensional or sparse scenario. Moreover, our theory confirms two phenomena in adversarially robust learning: adversarial robustness hurts generalization, and unlabeled data help improve the generalization. In the end, we conduct numerical simulations to verify our theory.