Zhuo He

CV
h-index15
12papers
55citations
Novelty48%
AI Score45

12 Papers

SPJul 10, 2024
Generative AI for RF Sensing in IoT systems

Li Wang, Chao Zhang, Qiyang Zhao et al.

The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significant challenges, including noise, interference, incomplete data, and high deployment costs, which limit their effectiveness and scalability. This paper investigates the potential of Generative AI (GenAI) to overcome these limitations within the IoT ecosystem. We provide a comprehensive review of state-of-the-art GenAI techniques, focusing on their application to RF sensing problems. By generating high-quality synthetic data, enhancing signal quality, and integrating multi-modal data, GenAI offers robust solutions for RF environment reconstruction, localization, and imaging. Additionally, GenAI's ability to generalize enables IoT devices to adapt to new environments and unseen tasks, improving their efficiency and performance. The main contributions of this article include a detailed analysis of the challenges in RF sensing, the presentation of innovative GenAI-based solutions, and the proposal of a unified framework for diverse RF sensing tasks. Through case studies, we demonstrate the effectiveness of integrating GenAI models, leading to advanced, scalable, and intelligent IoT systems.

SPJun 2, 2023
A new method using deep transfer learning on ECG to predict the response to cardiac resynchronization therapy

Zhuo He, Hongjin Si, Xinwei Zhang et al.

Background: Cardiac resynchronization therapy (CRT) has emerged as an effective treatment for heart failure patients with electrical dyssynchrony. However, accurately predicting which patients will respond to CRT remains a challenge. This study explores the application of deep transfer learning techniques to train a predictive model for CRT response. Methods: In this study, the short-time Fourier transform (STFT) technique was employed to transform ECG signals into two-dimensional images. A transfer learning approach was then applied on the MIT-BIT ECG database to pre-train a convolutional neural network (CNN) model. The model was fine-tuned to extract relevant features from the ECG images, and then tested on our dataset of CRT patients to predict their response. Results: Seventy-one CRT patients were enrolled in this study. The transfer learning model achieved an accuracy of 72% in distinguishing responders from non-responders in the local dataset. Furthermore, the model showed good sensitivity (0.78) and specificity (0.79) in identifying CRT responders. The performance of our model outperformed clinic guidelines and traditional machine learning approaches. Conclusion: The utilization of ECG images as input and leveraging the power of transfer learning allows for improved accuracy in identifying CRT responders. This approach offers potential for enhancing patient selection and improving outcomes of CRT.

CVDec 22, 2025
A Convolutional Neural Deferred Shader for Physics Based Rendering

Zhuo He, Yingdong Ru, Qianying Liu et al.

Recent advances in neural rendering have achieved impressive results on photorealistic shading and relighting, by using a multilayer perceptron (MLP) as a regression model to learn the rendering equation from a real-world dataset. Such methods show promise for photorealistically relighting real-world objects, which is difficult to classical rendering, as there is no easy-obtained material ground truth. However, significant challenges still remain the dense connections in MLPs result in a large number of parameters, which requires high computation resources, complicating the training, and reducing performance during rendering. Data driven approaches require large amounts of training data for generalization; unbalanced data might bias the model to ignore the unusual illumination conditions, e.g. dark scenes. This paper introduces pbnds+: a novel physics-based neural deferred shading pipeline utilizing convolution neural networks to decrease the parameters and improve the performance in shading and relighting tasks; Energy regularization is also proposed to restrict the model reflection during dark illumination. Extensive experiments demonstrate that our approach outperforms classical baselines, a state-of-the-art neural shading model, and a diffusion-based method.

CVMar 13
Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation

Yifan Zhan, Zhengqing Chen, Qingjie Wang et al.

A major challenge in autonomous driving is the "long tail" of safety-critical edge cases, which often emerge from unusual combinations of common traffic elements. Synthesizing these scenarios is crucial, yet current controllable generative models provide incomplete or entangled guidance, preventing the independent manipulation of scene structure, object identity, and ego actions. We introduce CompoSIA, a compositional driving video simulator that disentangles these traffic factors, enabling fine-grained control over diverse adversarial driving scenarios. To support controllable identity replacement of scene elements, we propose a noise-level identity injection, allowing pose-agnostic identity generation across diverse element poses, all from a single reference image. Furthermore, a hierarchical dual-branch action control mechanism is introduced to improve action controllability. Such disentangled control enables adversarial scenario synthesis-systematically combining safe elements into dangerous configurations that entangled generators cannot produce. Extensive comparisons demonstrate superior controllable generation quality over state-of-the-art baselines, with a 17% improvement in FVD for identity editing and reductions of 30% and 47% in rotation and translation errors for action control. Furthermore, downstream stress-testing reveals substantial planner failures: across editing modalities, the average collision rate of 3s increases by 173%.

LGJun 21, 2025
Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains

Zhuo He, Shuang Li, Wenze Song et al.

Endowing deep models with the ability to generalize in dynamic scenarios is of vital significance for real-world deployment, given the continuous and complex changes in data distribution. Recently, evolving domain generalization (EDG) has emerged to address distribution shifts over time, aiming to capture evolving patterns for improved model generalization. However, existing EDG methods may suffer from spurious correlations by modeling only the dependence between data and targets across domains, creating a shortcut between task-irrelevant factors and the target, which hinders generalization. To this end, we design a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts, and propose \textbf{S}tatic-D\textbf{YN}amic \textbf{C}ausal Representation Learning (\textbf{SYNC}), an approach that effectively learns time-aware causal representations. Specifically, it integrates specially designed information-theoretic objectives into a sequential VAE framework which captures evolving patterns, and produces the desired representations by preserving intra-class compactness of causal factors both across and within domains. Moreover, we theoretically show that our method can yield the optimal causal predictor for each time domain. Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.

CVDec 4, 2023
Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation

Longhui Yuan, Shuang Li, Zhuo He et al.

Test-time adaptation (TTA) adapts the pre-trained models during inference using unlabeled test data and has received a lot of research attention due to its potential practical value. Unfortunately, without any label supervision, existing TTA methods rely heavily on heuristic or empirical studies. Where to update the model always falls into suboptimal or brings more computational resource consumption. Meanwhile, there is still a significant performance gap between the TTA approaches and their supervised counterparts. Motivated by active learning, in this work, we propose the active test-time adaptation for semantic segmentation setup. Specifically, we introduce the human-in-the-loop pattern during the testing phase, which queries very few labels to facilitate predictions and model updates in an online manner. To do so, we propose a simple but effective ATASeg framework, which consists of two parts, i.e., model adapter and label annotator. Extensive experiments demonstrate that ATASeg bridges the performance gap between TTA methods and their supervised counterparts with only extremely few annotations, even one click for labeling surpasses known SOTA TTA methods by 2.6% average mIoU on ACDC benchmark. Empirical results imply that progress in either the model adapter or the label annotator will bring improvements to the ATASeg framework, giving it large research and reality potential.

CVApr 24, 2025
Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields

Zhuo He, Paul Henderson, Nicolas Pugeault

StyleGAN has demonstrated the ability of GANs to synthesize highly-realistic faces of imaginary people from random noise. One limitation of GAN-based image generation is the difficulty of controlling the features of the generated image, due to the strong entanglement of the low-dimensional latent space. Previous work that aimed to control StyleGAN with image or text prompts modulated sampling in W latent space, which is more expressive than Z latent space. However, W space still has restricted expressivity since it does not control the feature synthesis directly; also the feature embedding in W space requires a pre-training process to reconstruct the style signal, limiting its application. This paper introduces the concept of "generative fields" to explain the hierarchical feature synthesis in StyleGAN, inspired by the receptive fields of convolution neural networks (CNNs). Additionally, we propose a new image editing pipeline for StyleGAN using generative field theory and the channel-wise style latent space S, utilizing the intrinsic structural feature of CNNs to achieve disentangled control of feature synthesis at synthesis time.

CVApr 16, 2025
Beyond Reconstruction: A Physics Based Neural Deferred Shader for Photo-realistic Rendering

Zhuo He, Paul Henderson, Nicolas Pugeault

Deep learning based rendering has achieved major improvements in photo-realistic image synthesis, with potential applications including visual effects in movies and photo-realistic scene building in video games. However, a significant limitation is the difficulty of decomposing the illumination and material parameters, which limits such methods to reconstructing an input scene, without any possibility to control these parameters. This paper introduces a novel physics based neural deferred shading pipeline to decompose the data-driven rendering process, learn a generalizable shading function to produce photo-realistic results for shading and relighting tasks; we also propose a shadow estimator to efficiently mimic shadowing effects. Our model achieves improved performance compared to classical models and a state-of-art neural shading model, and enables generalizable photo-realistic shading from arbitrary illumination input.

CVMay 4, 2023
A new method using deep learning to predict the response to cardiac resynchronization therapy

Kristoffer Larsena, Zhuo He, Chen Zhao et al.

Background. Clinical parameters measured from gated single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) have value in predicting cardiac resynchronization therapy (CRT) patient outcomes, but still show limitations. The purpose of this study is to combine clinical variables, features from electrocardiogram (ECG), and parameters from assessment of cardiac function with polarmaps from gated SPECT MPI through deep learning (DL) to predict CRT response. Methods. 218 patients who underwent rest gated SPECT MPI were enrolled in this study. CRT response was defined as an increase in left ventricular ejection fraction (LVEF) > 5% at a 6-month follow up. A DL model was constructed by combining a pre-trained VGG16 module and a multilayer perceptron. Two modalities of data were input to the model: polarmap images from SPECT MPI and tabular data from clinical features and ECG parameters. Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to the VGG16 module to provide explainability for the polarmaps. For comparison, four machine learning (ML) models were trained using only the tabular features. Results. Modeling was performed on 218 patients who underwent CRT implantation with a response rate of 55.5% (n = 121). The DL model demonstrated average AUC (0.83), accuracy (0.73), sensitivity (0.76), and specificity (0.69) surpassing the ML models and guideline criteria. Guideline recommendations presented accuracy (0.53), sensitivity (0.75), and specificity (0.26). Conclusions. The DL model outperformed the ML models, showcasing the additional predictive benefit of utilizing SPECT MPI polarmaps. Incorporating additional patient data directly in the form of medical imagery can improve CRT response prediction.

IVOct 11, 2021
Spatial-temporal V-Net for automatic segmentation and quantification of right ventricles in gated myocardial perfusion SPECT images

Chen Zhao, Shi Shi, Zhuo He et al.

Background. Functional assessment of right ventricle (RV) using gated myocardial perfusion single-photon emission computed tomography (MPS) heavily relies on the precise extraction of right ventricular contours. In this paper, we present a new deep-learning-based model integrating both the spatial and temporal features in gated MPS images to perform the segmentation of the RV epicardium and endocardium. Methods. By integrating the spatial features from each cardiac frame of the gated MPS and the temporal features from the sequential cardiac frames of the gated MPS, we developed a Spatial-Temporal V-Net (ST-VNet) for automatic extraction of RV endocardial and epicardial contours. In the ST-VNet, a V-Net is employed to hierarchically extract spatial features, and convolutional long-term short-term memory (ConvLSTM) units are added to the skip-connection pathway to extract the temporal features. The input of the ST-VNet is ECG-gated sequential frames of the MPS images and the output is the probability map of the epicardial or endocardial masks. A Dice similarity coefficient (DSC) loss which penalizes the discrepancy between the model prediction and the ground truth was adopted to optimize the segmentation model. Results. Our segmentation model was trained and validated on a retrospective dataset with 45 subjects, and the cardiac cycle of each subject was divided into 8 gates. The proposed ST-VNet achieved a DSC of 0.8914 and 0.8157 for the RV epicardium and endocardium segmentation, respectively. The mean absolute error, the mean squared error, and the Pearson correlation coefficient of the RV ejection fraction (RVEF) between the ground truth and the model prediction were 0.0609, 0.0830, and 0.6985. Conclusion. Our proposed ST-VNet is an effective model for RV segmentation. It has great promise for clinical use in RV functional assessment.

MED-PHJun 1, 2021
A method using deep learning to discover new predictors of CRT response from mechanical dyssynchrony on gated SPECT MPI

Zhuo He, Xinwei Zhang, Chen Zhao et al.

Background. Studies have shown that the conventional left ventricular mechanical dyssynchrony (LVMD) parameters have their own statistical limitations. The purpose of this study is to extract new LVMD parameters from the phase analysis of gated SPECT MPI by deep learning to help CRT patient selection. Methods. One hundred and three patients who underwent rest gated SPECT MPI were enrolled in this study. CRT response was defined as a decrease in left ventricular end-systolic volume (LVESV) >= 15% at 6 +- 1 month follow up. Autoencoder (AE), an unsupervised deep learning method, was trained by the raw LV systolic phase polar maps to extract new LVMD parameters, called AE-based LVMD parameters. Correlation analysis was used to explain the relationships between new parameters with conventional LVMD parameters. Univariate and multivariate analyses were used to establish a multivariate model for predicting CRT response. Results. Complete data were obtained in 102 patients, 44.1% of them were classified as CRT responders. AE-based LVMD parameter was significant in the univariate (OR 1.24, 95% CI 1.07 - 1.44, P = 0.006) and multivariate analyses (OR 1.03, 95% CI 1.01 - 1.06, P = 0.006). Moreover, it had incremental value over PSD (AUC 0.72 vs. 0.63, LH 8.06, P = 0.005) and PBW (AUC 0.72 vs. 0.64, LH 7.87, P = 0.005), combined with significant clinic characteristics, including LVEF and gender. Conclusions. The new LVMD parameters extracted by autoencoder from the baseline gated SPECT MPI has the potential to improve the prediction of CRT response.

IVJan 25, 2021
A new approach to extracting coronary arteries and detecting stenosis in invasive coronary angiograms

Chen Zhao, Haipeng Tang, Daniel McGonigle et al.

In stable coronary artery disease (CAD), reduction in mortality and/or myocardial infarction with revascularization over medical therapy has not been reliably achieved. Coronary arteries are usually extracted to perform stenosis detection. We aim to develop an automatic algorithm by deep learning to extract coronary arteries from ICAs.In this study, a multi-input and multi-scale (MIMS) U-Net with a two-stage recurrent training strategy was proposed for the automatic vessel segmentation. Incorporating features such as the Inception residual module with depth-wise separable convolutional layers, the proposed model generated a refined prediction map with the following two training stages: (i) Stage I coarsely segmented the major coronary arteries from pre-processed single-channel ICAs and generated the probability map of vessels; (ii) during the Stage II, a three-channel image consisting of the original preprocessed image, a generated probability map, and an edge-enhanced image generated from the preprocessed image was fed to the proposed MIMS U-Net to produce the final segmentation probability map. During the training stage, the probability maps were iteratively and recurrently updated by feeding into the neural network. After segmentation, an arterial stenosis detection algorithm was developed to extract vascular centerlines and calculate arterial diameters to evaluate stenotic level. Experimental results demonstrated that the proposed method achieved an average Dice score of 0.8329, an average sensitivity of 0.8281, and an average specificity of 0.9979 in our dataset with 294 ICAs obtained from 73 patient. Moreover, our stenosis detection algorithm achieved a true positive rate of 0.6668 and a positive predictive value of 0.7043.