Zongyu Li

h-index10

14papers

88citations

Novelty45%

AI Score50

Ranked #44,290 of 201,326 authors (top 22%)#16,941 in CV (top 29%)

14 Papers

MLSep 19, 2022

A Survey of Deep Causal Models and Their Industrial Applications

Zongyu Li, Xiaobo Guo, Siwei Qiang

The notion of causality assumes a paramount position within the realm of human cognition. Over the past few decades, there has been significant advancement in the domain of causal effect estimation across various disciplines, including but not limited to computer science, medicine, economics, and industrial applications. Given the continous advancements in deep learning methodologies, there has been a notable surge in its utilization for the estimation of causal effects using counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective functions to estimate counterfactual data unbiasedly. Different from the existing surveys on causal models in machine learning, this review mainly focuses on the overview of the deep causal models based on neural networks, and its core contributions are as follows: 1) we cast insight on a comprehensive overview of deep causal models from both timeline of development and method classification perspectives; 2) we outline some typical applications of causal effect estimation to industry; 3) we also endeavor to present a detailed categorization and analysis on relevant datasets, source codes and experiments.

IVSep 12, 2024Code

MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation

Aaron Cao, Zongyu Li, Jordan Jomsky et al.

Widely used traditional pipelines for subcortical brain segmentation are often inefficient and slow, particularly when processing large datasets. Furthermore, deep learning models face challenges due to the high resolution of MRI images and the large number of anatomical classes involved. To address these limitations, we developed a 3D patch-based hybrid CNN-Mamba model that leverages Mamba's selective scan algorithm, thereby enhancing segmentation accuracy and efficiency for 3D inputs. This retrospective study utilized 1784 T1-weighted MRI scans from a diverse, multi-site dataset of healthy individuals. The dataset was divided into training, validation, and testing sets with a 1076/345/363 split. The scans were obtained from 1.5T and 3T MRI machines. Our model's performance was validated against several benchmarks, including other CNN-Mamba, CNN-Transformer, and pure CNN networks, using FreeSurfer-generated ground truths. We employed the Dice Similarity Coefficient (DSC), Volume Similarity (VS), and Average Symmetric Surface Distance (ASSD) as evaluation metrics. Statistical significance was determined using the Wilcoxon signed-rank test with a threshold of P < 0.05. The proposed model achieved the highest overall performance across all metrics (DSC 0.88383; VS 0.97076; ASSD 0.33604), significantly outperforming all non-Mamba-based models (P < 0.001). While the model did not show significant improvement in DSC or VS compared to another Mamba-based model (P-values of 0.114 and 0.425), it demonstrated a significant enhancement in ASSD (P < 0.001) with approximately 20% fewer parameters. In conclusion, our proposed hybrid CNN-Mamba architecture offers an efficient and accurate approach for 3D subcortical brain segmentation, demonstrating potential advantages over existing methods. Code is available at: https://github.com/aaroncao06/MedSegMamba.

CVMar 1, 2022

Runtime Detection of Executional Errors in Robot-Assisted Surgery

Zongyu Li, Kay Hutchinson, Homa Alemzadeh

Despite significant developments in the design of surgical robots and automated techniques for objective evaluation of surgical skills, there are still challenges in ensuring safety in robot-assisted minimally-invasive surgery (RMIS). This paper presents a runtime monitoring system for the detection of executional errors during surgical tasks through the analysis of kinematic data. The proposed system incorporates dual Siamese neural networks and knowledge of surgical context, including surgical tasks and gestures, their distributional similarities, and common error modes, to learn the differences between normal and erroneous surgical trajectories from small training datasets. We evaluate the performance of the error detection using Siamese networks compared to single CNN and LSTM networks trained with different levels of contextual knowledge and training data, using the dry-lab demonstrations of the Suturing and Needle Passing tasks from the JIGSAWS dataset. Our results show that gesture specific task nonspecific Siamese networks obtain micro F1 scores of 0.94 (Siamese-CNN) and 0.95 (Siamese-LSTM), and perform better than single CNN (0.86) and LSTM (0.87) networks. These Siamese networks also outperform gesture nonspecific task specific Siamese-CNN and Siamese-LSTM models for Suturing and Needle Passing.

CVFeb 28, 2023

Towards Surgical Context Inference and Translation to Gestures

Kay Hutchinson, Zongyu Li, Ian Reyes et al.

Manual labeling of gestures in robot-assisted surgery is labor intensive, prone to errors, and requires expertise or training. We propose a method for automated and explainable generation of gesture transcripts that leverages the abundance of data for image segmentation. Surgical context is detected using segmentation masks by examining the distances and intersections between the tools and objects. Next, context labels are translated into gesture transcripts using knowledge-based Finite State Machine (FSM) and data-driven Long Short Term Memory (LSTM) models. We evaluate the performance of each stage of our method by comparing the results with the ground truth segmentation masks, the consensus context labels, and the gesture labels in the JIGSAWS dataset. Our results show that our segmentation models achieve state-of-the-art performance in recognizing needle and thread in Suturing and we can automatically detect important surgical states with high agreement with crowd-sourced labels (e.g., contact between graspers and objects in Suturing). We also find that the FSM models are more robust to poor segmentation and labeling performance than LSTMs. Our proposed method can significantly shorten the gesture labeling process (~2.8 times).

CVAug 24, 2023

Robotic Scene Segmentation with Memory Network for Runtime Surgical Context Inference

Zongyu Li, Ian Reyes, Homa Alemzadeh

Surgical context inference has recently garnered significant attention in robot-assisted surgery as it can facilitate workflow analysis, skill assessment, and error detection. However, runtime context inference is challenging since it requires timely and accurate detection of the interactions among the tools and objects in the surgical scene based on the segmentation of video data. On the other hand, existing state-of-the-art video segmentation methods are often biased against infrequent classes and fail to provide temporal consistency for segmented masks. This can negatively impact the context inference and accurate detection of critical states. In this study, we propose a solution to these challenges using a Space Time Correspondence Network (STCN). STCN is a memory network that performs binary segmentation and minimizes the effects of class imbalance. The use of a memory bank in STCN allows for the utilization of past image and segmentation information, thereby ensuring consistency of the masks. Our experiments using the publicly available JIGSAWS dataset demonstrate that STCN achieves superior segmentation performance for objects that are difficult to segment, such as needle and thread, and improves context inference compared to the state-of-the-art. We also demonstrate that segmentation and context inference can be performed at runtime without compromising performance.

LGMay 20

Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework

Zongyu Li, Xuanyu Liu, Gongce Cao et al.

Learning from implicit feedback in recommender systems is fundamentally challenged by pervasive label noise. While conventional denoising approaches often discard noisy instances to ensure robustness, this strategy inevitably suffers from low data utilization. Alternative methods that employ a Bayes-label transition matrix (BLTM) can leverage all available data, but their estimates tend to be biased in practical recommendation scenarios. To address these limitations, this paper proposes a Robust GMM-weighted Bayes-label Transition Matrix framework (RGBT). Our solution utilizes a Gaussian Mixture Model (GMM) to derive instance-specific reliability scores, which systematically calibrate the BLTM estimation to mitigate bias. Theoretical analysis confirms that our approach, by leveraging the BLTM framework with GMM calibration, simultaneously ensures full sample utilization, delivers consistent estimation, and critically, achieves a significant reduction in estimation variance. Extensive experiments on multiple real-world and synthetically flipped datasets demonstrate that RGBT not only utilizes noisy samples more effectively than mainstream reliable sample-based denoising methods, but also achieves significantly superior calibration capability of the transition matrix compared to state-of-the-art transition matrix-based denoising approaches.

LGMay 20

Robust Personalized Recommendation under Hidden Confounding in MNAR

Zongyu Li, Wanting Su, Tianyu Xia

Recommender systems often rely on observational user--item interaction data, which is prone to selection bias due to users' selective interactions with items. Inverse propensity weighting and doubly robust estimators effectively mitigate selection bias under observed confounding, but are unreliable in the presence of hidden confounders. Existing approaches relying on randomized controlled trials (RCTs) or global sensitivity bounds are constrained in practice: RCTs demand costly experimental data, while global sensitivity bounds presume a uniformly bounded effect of unmeasured confounders on propensities through sensitivity analysis, thereby neglecting heterogeneity across user--item interactions. To overcome this limitation, we propose a novel framework, which estimates user--item level sensitivity bounds, thereby substantially relaxing the homogeneity assumption inherent in global sensitivity bounds named Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID). To ensure both robustness and predictive accuracy, we further develop an adversarial optimization strategy and propose a benchmark-guided variant (BPUID) that incorporates pre-trained models as stabilizing references. Extensive experiments on three real-world datasets demonstrate that our approach significantly outperforms global methods under hidden confounding, without requiring RCT data.

IVDec 18, 2023Code

Deep Learning-based MRI Reconstruction with Artificial Fourier Transform Network (AFTNet)

Yanting Yang, Yiren Zhang, Zongyu Li et al.

Deep complex-valued neural networks (CVNNs) provide a powerful way to leverage complex number operations and representations and have succeeded in several phase-based applications. However, previous networks have not fully explored the impact of complex-valued networks in the frequency domain. Here, we introduce a unified complex-valued deep learning framework-Artificial Fourier Transform Network (AFTNet)-which combines domain-manifold learning and CVNNs. AFTNet can be readily used to solve image inverse problems in domain transformation, especially for accelerated magnetic resonance imaging (MRI) reconstruction and other applications. While conventional methods typically utilize magnitude images or treat the real and imaginary components of k-space data as separate channels, our approach directly processes raw k-space data in the frequency domain, utilizing complex-valued operations. This allows for a mapping between the frequency (k-space) and image domain to be determined through cross-domain learning. We show that AFTNet achieves superior accelerated MRI reconstruction compared to existing approaches. Furthermore, our approach can be applied to various tasks, such as denoised magnetic resonance spectroscopy (MRS) reconstruction and datasets with various contrasts. The AFTNet presented here is a valuable preprocessing component for different preclinical studies and provides an innovative alternative for solving inverse problems in imaging and spectroscopy. The code is available at: https://github.com/yanting-yang/AFT-Net.

IVMar 23

Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration

Jiaqi Shang, Haojin Wu, Yinyi Lai et al.

Deformable image registration plays a fundamental role in medical image analysis by enabling spatial alignment of anatomical structures across subjects. While recent deep learning-based approaches have significantly improved computational efficiency, many existing methods remain limited in capturing long-range anatomical correspondence and maintaining deformation consistency. In this work, we present a cycle inverse-consistent transformer-based framework for deformable brain MRI registration. The model integrates a Swin-UNet architecture with bidirectional consistency constraints, enabling the joint estimation of forward and backward deformation fields. This design allows the framework to capture both local anatomical details and global spatial relationships while improving deformation stability. We conduct a comprehensive evaluation of the proposed framework on a large multi-center dataset consisting of 2851 T1-weighted brain MRI scans aggregated from 13 public datasets. Experimental results demonstrate that the proposed framework achieves strong and balanced performance across multiple quantitative evaluation metrics while maintaining stable and physically plausible deformation fields. Detailed quantitative comparisons with baseline methods, including ANTs, ICNet, and VoxelMorph, are provided in the appendix. Experimental results demonstrate that CICTM achieves consistently strong performance across multiple evaluation criteria while maintaining stable and physically plausible deformation fields. These properties make the proposed framework suitable for large-scale neuroimaging datasets where both accuracy and deformation stability are critical.

LGApr 24

From Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables

Zongyu Li

Latent variables pose a fundamental challenge to causal discovery and inference. Conventional local methods focus on direct neighbors but fail to provide macro level insights. Cluster level methods enable macro causal reasoning but either assume clusters are known a priori or require causal sufficiency. Moreover, directly applying single variable causal discovery methods to cluster level problems violates causal sufficiency and leads to incorrect results. To overcome these limitations, this paper proposes L2C (Local to Cluster Causal Abstraction), a unified framework that bridges local structure learning and cluster level causal discovery. Unlike prior work that requires a complete manual assignment of micro variables to clusters, L2C discovers the partition automatically from local causal patterns. Our solution leverages a cluster reduction theorem to reduce any cluster to at most three nodes without loss of causal information, applies local causal discovery to identify direct causes, effects, and V structures in the presence of latent variables, and performs macro level causal inference via cluster level calculus on the learned cluster graph. L2C does not assume causal sufficiency, as latent variables are handled through local discovery. Theoretical analysis shows that L2C ensures soundness, atomic completeness, and computational efficiency. Extensive experiments on synthetic and real world data demonstrate that L2C accurately recovers ground truth clusters and achieves superior macro causal effect identification compared to existing baselines.

CVOct 29, 2024

Advancing Efficient Brain Tumor Multi-Class Classification -- New Insights from the Vision Mamba Model in Transfer Learning

Yinyi Lai, Anbo Cao, Yuan Gao et al.

Early and accurate diagnosis of brain tumors is crucial for improving patient survival rates. However, the detection and classification of brain tumors are challenging due to their diverse types and complex morphological characteristics. This study investigates the application of pre-trained models for brain tumor classification, with a particular focus on deploying the Mamba model. We fine-tuned several mainstream transfer learning models and applied them to the multi-class classification of brain tumors. By comparing these models to those trained from scratch, we demonstrated the significant advantages of transfer learning, especially in the medical imaging field, where annotated data is often limited. Notably, we introduced the Vision Mamba (Vim), a novel network architecture, and applied it for the first time in brain tumor classification, achieving exceptional classification accuracy. Experimental results indicate that the Vim model achieved 100% classification accuracy on an independent test set, emphasizing its potential for tumor classification tasks. These findings underscore the effectiveness of transfer learning in brain tumor classification and reveal that, compared to existing state-of-the-art models, the Vim model is lightweight, efficient, and highly accurate, offering a new perspective for clinical applications. Furthermore, the framework proposed in this study for brain tumor classification, based on transfer learning and the Vision Mamba model, is broadly applicable to other medical imaging classification problems.

IVDec 1, 2024

Enhancing Brain Age Estimation with a Multimodal 3D CNN Approach Combining Structural MRI and AI-Synthesized Cerebral Blood Volume Data

Jordan Jomsky, Zongyu Li, Yiren Zhang et al.

The increasing global aging population necessitates improved methods to assess brain aging and its related neurodegenerative changes. Brain Age Gap Estimation (BrainAGE) offers a neuroimaging biomarker for understanding these changes by predicting brain age from MRI scans. Current approaches primarily use T1-weighted magnetic resonance imaging (T1w MRI) data, capturing only structural brain information. To address this limitation, AI-generated Cerebral Blood Volume (AICBV) data, synthesized from non-contrast MRI scans, offers functional insights by revealing subtle blood-tissue contrasts otherwise undetectable in standard imaging. We integrated AICBV with T1w MRI to predict brain age, combining both structural and functional metrics. We developed a deep learning model using a VGG-based architecture for both modalities and combined their predictions using linear regression. Our model achieved a mean absolute error (MAE) of 3.95 years and an $R^2$ of 0.943 on the test set ($n = 288$), outperforming existing models trained on similar data. We have further created gradient-based class activation maps (Grad-CAM) to visualize the regions of the brain that most influenced the model's predictions, providing interpretable insights into the structural and functional contributors to brain aging.

SPMay 12, 2023

Poisson-Gaussian Holographic Phase Retrieval with Score-based Image Prior

Zongyu Li, Jason Hu, Xiaojian Xu et al.

Phase retrieval (PR) is a crucial problem in many imaging applications. This study focuses on resolving the holographic phase retrieval problem in situations where the measurements are affected by a combination of Poisson and Gaussian noise, which commonly occurs in optical imaging systems. To address this problem, we propose a new algorithm called "AWFS" that uses the accelerated Wirtinger flow (AWF) with a score function as generative prior. Specifically, we formulate the PR problem as an optimization problem that incorporates both data fidelity and regularization terms. We calculate the gradient of the log-likelihood function for PR and determine its corresponding Lipschitz constant. Additionally, we introduce a generative prior in our regularization framework by using score matching to capture information about the gradient of image prior distributions. We provide theoretical analysis that establishes a critical-point convergence guarantee for the proposed algorithm. The results of our simulation experiments on three different datasets show the following: 1) By using the PG likelihood model, the proposed algorithm improves reconstruction compared to algorithms based solely on Gaussian or Poisson likelihood. 2) The proposed score-based image prior method, performs better than the method based on denoising diffusion probabilistic model (DDPM), as well as plug-and-play alternating direction method of multipliers (PnP-ADMM) and regularization by denoising (RED).

ROJun 22, 2021

Analysis of Executional and Procedural Errors in Dry-lab Robotic Surgery Experiments

Kay Hutchinson, Zongyu Li, Leigh A. Cantrell et al.

Background Analyzing kinematic and video data can help identify potentially erroneous motions that lead to sub-optimal surgeon performance and safety-critical events in robot-assisted surgery. Methods We develop a rubric for identifying task and gesture-specific Executional and Procedural errors and evaluate dry-lab demonstrations of Suturing and Needle Passing tasks from the JIGSAWS dataset. We characterize erroneous parts of demonstrations by labeling video data, and use distribution similarity analysis and trajectory averaging on kinematic data to identify parameters that distinguish erroneous gestures. Results Executional error frequency varies by task and gesture, and correlates with skill level. Some predominant error modes in each gesture are distinguishable by analyzing error-specific kinematic parameters. Procedural errors could lead to lower performance scores and increased demonstration times but also depend on surgical style. Conclusions This study provides insights into context-dependent errors that can be used to design automated error detection mechanisms and improve training and skill assessment.