Yinghao Zhang

IV
h-index30
15papers
902citations
Novelty52%
AI Score48

15 Papers

CVSep 14, 2022
SCULPTOR: Skeleton-Consistent Face Creation Using a Learned Parametric Generator

Zesong Qiu, Yuwei Li, Dongming He et al.

Recent years have seen growing interest in 3D human faces modelling due to its wide applications in digital human, character generation and animation. Existing approaches overwhelmingly emphasized on modeling the exterior shapes, textures and skin properties of faces, ignoring the inherent correlation between inner skeletal structures and appearance. In this paper, we present SCULPTOR, 3D face creations with Skeleton Consistency Using a Learned Parametric facial generaTOR, aiming to facilitate easy creation of both anatomically correct and visually convincing face models via a hybrid parametric-physical representation. At the core of SCULPTOR is LUCY, the first large-scale shape-skeleton face dataset in collaboration with plastic surgeons. Named after the fossils of one of the oldest known human ancestors, our LUCY dataset contains high-quality Computed Tomography (CT) scans of the complete human head before and after orthognathic surgeries, critical for evaluating surgery results. LUCY consists of 144 scans of 72 subjects (31 male and 41 female) where each subject has two CT scans taken pre- and post-orthognathic operations. Based on our LUCY dataset, we learn a novel skeleton consistent parametric facial generator, SCULPTOR, which can create the unique and nuanced facial features that help define a character and at the same time maintain physiological soundness. Our SCULPTOR jointly models the skull, face geometry and face appearance under a unified data-driven framework, by separating the depiction of a 3D face into shape blend shape, pose blend shape and facial expression blend shape. SCULPTOR preserves both anatomic correctness and visual realism in facial generation tasks compared with existing methods. Finally, we showcase the robustness and effectiveness of SCULPTOR in various fancy applications unseen before.

IVJul 19, 2023Code
Deep unrolling Shrinkage Network for Dynamic MR imaging

Yinghao Zhang, Xiaodi Li, Weihang Li et al.

Deep unrolling networks that utilize sparsity priors have achieved great success in dynamic magnetic resonance (MR) imaging. The convolutional neural network (CNN) is usually utilized to extract the transformed domain, and then the soft thresholding (ST) operator is applied to the CNN-transformed data to enforce the sparsity priors. However, the ST operator is usually constrained to be the same across all channels of the CNN-transformed data. In this paper, we propose a novel operator, called soft thresholding with channel attention (AST), that learns the threshold for each channel. In particular, we put forward a novel deep unrolling shrinkage network (DUS-Net) by unrolling the alternating direction method of multipliers (ADMM) for optimizing the transformed $l_1$ norm dynamic MR reconstruction model. Experimental results on an open-access dynamic cine MR dataset demonstrate that the proposed DUS-Net outperforms the state-of-the-art methods. The source code is available at \url{https://github.com/yhao-z/DUS-Net}.

IVJun 2, 2022
Dynamic MRI using Learned Transform-based Tensor Low-Rank Network (LT$^2$LR-Net)

Yinghao Zhang, Peng Li, Yue Hu

While low-rank matrix prior has been exploited in dynamic MR image reconstruction and has obtained satisfying performance, tensor low-rank models have recently emerged as powerful alternative representations for three-dimensional dynamic MR datasets. In this paper, we introduce a novel deep unrolling network for dynamic MRI, namely the learned transform-based tensor low-rank network (LT$^2$LR-Net). First, we generalize the tensor singular value decomposition (t-SVD) into an arbitrary unitary transform-based version and subsequently propose the novel transformed tensor nuclear norm (TTNN). Then, we design a novel TTNN-based iterative optimization algorithm based on the alternating direction method of multipliers (ADMM) to exploit the tensor low-rank prior in the transformed domain. The corresponding iterative steps are unrolled into the proposed LT$^2$LR-Net, where the convolutional neural network (CNN) is incorporated to adaptively learn the transformation from the dynamic MR dataset for more robust and accurate tensor low-rank representations. Experimental results on the cardiac cine MR dataset demonstrate that the proposed framework can provide improved recovery results compared with the state-of-the-art methods.

IVJun 2, 2022
Dynamic Cardiac MRI Reconstruction Using Combined Tensor Nuclear Norm and Casorati Matrix Nuclear Norm Regularizations

Yinghao Zhang, Yue Hu

Low-rank tensor models have been applied in accelerating dynamic magnetic resonance imaging (dMRI). Recently, a new tensor nuclear norm based on t-SVD has been proposed and applied to tensor completion. Inspired by the different properties of the tensor nuclear norm (TNN) and the Casorati matrix nuclear norm (MNN), we introduce a combined TNN and Casorati MNN regularizations framework to reconstruct dMRI, which we term as TMNN. The proposed method simultaneously exploits the spatial structure and the temporal correlation of the dynamic MR data. The optimization problem can be efficiently solved by the alternating direction method of multipliers (ADMM). In order to further improve the computational efficiency, we develop a fast algorithm under the Cartesian sampling scenario. Numerical experiments based on cardiac cine MRI and perfusion MRI data demonstrate the performance improvement over the traditional Casorati nuclear norm regularization method.

IVSep 8, 2022
T2LR-Net: An unrolling network learning transformed tensor low-rank prior for dynamic MR image reconstruction

Yinghao Zhang, Peng Li, Yue Hu

The tensor low-rank prior has attracted considerable attention in dynamic MR reconstruction. Tensor low-rank methods preserve the inherent high-dimensional structure of data, allowing for improved extraction and utilization of intrinsic low-rank characteristics. However, most current methods are still confined to utilizing low-rank structures either in the image domain or predefined transformed domains. Designing an optimal transformation adaptable to dynamic MRI reconstruction through manual efforts is inherently challenging. In this paper, we propose a deep unrolling network that utilizes the convolutional neural network (CNN) to adaptively learn the transformed domain for leveraging tensor low-rank priors. Under the supervised mechanism, the learning of the tensor low-rank domain is directly guided by the reconstruction accuracy. Specifically, we generalize the traditional t-SVD to a transformed version based on arbitrary high-dimensional unitary transformations and introduce a novel unitary transformed tensor nuclear norm (UTNN). Subsequently, we present a dynamic MRI reconstruction model based on UTNN and devise an efficient iterative optimization algorithm using ADMM, which is finally unfolded into the proposed T2LR-Net. Experiments on two dynamic cardiac MRI datasets demonstrate that T2LR-Net outperforms the state-of-the-art optimization-based and unrolling network-based methods.

LGAug 13, 2024
Causal Effect Estimation using identifiable Variational AutoEncoder with Latent Confounders and Post-Treatment Variables

Yang Xie, Ziqi Xu, Debo Cheng et al.

Estimating causal effects from observational data is challenging, especially in the presence of latent confounders. Much work has been done on addressing this challenge, but most of the existing research ignores the bias introduced by the post-treatment variables. In this paper, we propose a novel method of joint Variational AutoEncoder (VAE) and identifiable Variational AutoEncoder (iVAE) for learning the representations of latent confounders and latent post-treatment variables from their proxy variables, termed CPTiVAE, to achieve unbiased causal effect estimation from observational data. We further prove the identifiability in terms of the representation of latent post-treatment variables. Extensive experiments on synthetic and semi-synthetic datasets demonstrate that the CPTiVAE outperforms the state-of-the-art methods in the presence of latent confounders and post-treatment variables. We further apply CPTiVAE to a real-world dataset to show its potential application.

CVDec 13, 2024Code
CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models

Dongyu Yao, Keling Yao, Junhong Zhou et al.

The obesity phenomenon, known as the heavy issue, is a leading cause of preventable chronic diseases worldwide. Traditional calorie estimation tools often rely on specific data formats or complex pipelines, limiting their practicality in real-world scenarios. Recently, vision-language models (VLMs) have excelled in understanding real-world contexts and enabling conversational interactions, making them ideal for downstream tasks such as ingredient analysis. However, applying VLMs to calorie estimation requires domain-specific data and alignment strategies. To this end, we curated CalData, a 330K image-text pair dataset tailored for ingredient recognition and calorie estimation, combining a large-scale recipe dataset with detailed nutritional instructions for robust vision-language training. Built upon this dataset, we present CaLoRAify, a novel VLM framework aligning ingredient recognition and calorie estimation via training with visual-text pairs. During inference, users only need a single monocular food image to estimate calories while retaining the flexibility of agent-based conversational interaction. With Low-rank Adaptation (LoRA) and Retrieve-augmented Generation (RAG) techniques, our system enhances the performance of foundational VLMs in the vertical domain of calorie estimation. Our code and data are fully open-sourced at https://github.com/KennyYao2001/16824-CaLORAify.

NANov 21, 2024Code
Differentiable SVD based on Moore-Penrose Pseudoinverse for Inverse Imaging Problems

Yinghao Zhang, Yue Hu

Low-rank regularization-based deep unrolling networks have achieved remarkable success in various inverse imaging problems (IIPs). However, the singular value decomposition (SVD) is non-differentiable when duplicated singular values occur, leading to severe numerical instability during training. In this paper, we propose a differentiable SVD based on the Moore-Penrose pseudoinverse to address this issue. To the best of our knowledge, this is the first work to provide a comprehensive analysis of the differentiability of the trivial SVD. Specifically, we show that the non-differentiability of SVD is essentially due to an underdetermined system of linear equations arising in the derivation process. We utilize the Moore-Penrose pseudoinverse to solve the system, thereby proposing a differentiable SVD. A numerical stability analysis in the context of IIPs is provided. Experimental results in color image compressed sensing and dynamic MRI reconstruction show that our proposed differentiable SVD can effectively address the numerical instability issue while ensuring computational precision. Code is available at https://github.com/yhao-z/SVD-inv.

LGJan 5
FedBiCross: A Bi-Level Optimization Framework to Tackle Non-IID Challenges in Data-Free One-Shot Federated Learning on Medical Data

Yuexuan Xia, Yinghao Zhang, Yalin Liu et al.

Data-free knowledge distillation-based one-shot federated learning (OSFL) trains a model in a single communication round without sharing raw data, making OSFL attractive for privacy-sensitive medical applications. However, existing methods aggregate predictions from all clients to form a global teacher. Under non-IID data, conflicting predictions cancel out during averaging, yielding near-uniform soft labels that provide weak supervision for distillation. We propose FedBiCross, a personalized OSFL framework with three stages: (1) clustering clients by model output similarity to form coherent sub-ensembles, (2) bi-level cross-cluster optimization that learns adaptive weights to selectively leverage beneficial cross-cluster knowledge while suppressing negative transfer, and (3) personalized distillation for client-specific adaptation. Experiments on four medical image datasets demonstrate that FedBiCross consistently outperforms state-of-the-art baselines across different non-IID degrees.

LGDec 8, 2023
Disentangled Latent Representation Learning for Tackling the Confounding M-Bias Problem in Causal Inference

Debo Cheng, Yang Xie, Ziqi Xu et al.

In causal inference, it is a fundamental task to estimate the causal effect from observational data. However, latent confounders pose major challenges in causal inference in observational data, for example, confounding bias and M-bias. Recent data-driven causal effect estimators tackle the confounding bias problem via balanced representation learning, but assume no M-bias in the system, thus they fail to handle the M-bias. In this paper, we identify a challenging and unsolved problem caused by a variable that leads to confounding bias and M-bias simultaneously. To address this problem with co-occurring M-bias and confounding bias, we propose a novel Disentangled Latent Representation learning framework for learning latent representations from proxy variables for unbiased Causal effect Estimation (DLRCE) from observational data. Specifically, DLRCE learns three sets of latent representations from the measured proxy variables to adjust for the confounding bias and M-bias. Extensive experiments on both synthetic and three real-world datasets demonstrate that DLRCE significantly outperforms the state-of-the-art estimators in the case of the presence of both confounding bias and M-bias.

CVFeb 17, 2025
JotlasNet: Joint Tensor Low-Rank and Attention-based Sparse Unrolling Network for Accelerating Dynamic MRI

Yinghao Zhang, Haiyan Gui, Ningdi Yang et al.

Joint low-rank and sparse unrolling networks have shown superior performance in dynamic MRI reconstruction. However, existing works mainly utilized matrix low-rank priors, neglecting the tensor characteristics of dynamic MRI images, and only a global threshold is applied for the sparse constraint to the multi-channel data, limiting the flexibility of the network. Additionally, most of them have inherently complex network structure, with intricate interactions among variables. In this paper, we propose a novel deep unrolling network, JotlasNet, for dynamic MRI reconstruction by jointly utilizing tensor low-rank and attention-based sparse priors. Specifically, we utilize tensor low-rank prior to exploit the structural correlations in high-dimensional data. Convolutional neural networks are used to adaptively learn the low-rank and sparse transform domains. A novel attention-based soft thresholding operator is proposed to assign a unique learnable threshold to each channel of the data in the CNN-learned sparse domain. The network is unrolled from the elaborately designed composite splitting algorithm and thus features a simple yet efficient parallel structure. Extensive experiments on two datasets (OCMR, CMRxRecon) demonstrate the superior performance of JotlasNet in dynamic MRI reconstruction.

LGOct 24, 2025
Cloud-Fog-Edge Collaborative Computing for Sequential MIoT Workflow: A Two-Tier DDPG-Based Scheduling Framework

Yuhao Fu, Yinghao Zhang, Yalin Liu et al.

The Medical Internet of Things (MIoT) demands stringent end-to-end latency guarantees for sequential healthcare workflows deployed over heterogeneous cloud-fog-edge infrastructures. Scheduling these sequential workflows to minimize makespan is an NP-hard problem. To tackle this challenge, we propose a Two-tier DDPG-based scheduling framework that decomposes the scheduling decision into a hierarchical process: a global controller performs layer selection (edge, fog, or cloud), while specialized local controllers handle node assignment within the chosen layer. The primary optimization objective is the minimization of the workflow makespan. Experiments results validate our approach, demonstrating increasingly superior performance over baselines as workflow complexity rises. This trend highlights the frameworks ability to learn effective long-term strategies, which is critical for complex, large-scale MIoT scheduling scenarios.

CLAug 23, 2025
Unbiased Reasoning for Knowledge-Intensive Tasks in Large Language Models via Conditional Front-Door Adjustment

Bo Zhao, Yinghao Zhang, Ziqi Xu et al.

Large Language Models (LLMs) have shown impressive capabilities in natural language processing but still struggle to perform well on knowledge-intensive tasks that require deep reasoning and the integration of external knowledge. Although methods such as Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) have been proposed to enhance LLMs with external knowledge, they still suffer from internal bias in LLMs, which often leads to incorrect answers. In this paper, we propose a novel causal prompting framework, Conditional Front-Door Prompting (CFD-Prompting), which enables the unbiased estimation of the causal effect between the query and the answer, conditional on external knowledge, while mitigating internal bias. By constructing counterfactual external knowledge, our framework simulates how the query behaves under varying contexts, addressing the challenge that the query is fixed and is not amenable to direct causal intervention. Compared to the standard front-door adjustment, the conditional variant operates under weaker assumptions, enhancing both robustness and generalisability of the reasoning process. Extensive experiments across multiple LLMs and benchmark datasets demonstrate that CFD-Prompting significantly outperforms existing baselines in both accuracy and robustness.

IVNov 4, 2021
Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports

Hong-Yu Zhou, Xiaoyu Chen, Yinghao Zhang et al.

Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully-supervised or self-supervised learning on a source domain. However, supervised pre-training requires a complex and labor intensive two-stage human-assisted annotation process while self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology named REviewing FreE-text Reports for Supervision (REFERS), which acquires free supervision signals from original radiology reports accompanying the radiographs. The proposed approach employs a vision transformer and is designed to learn joint representations from multiple views within every patient study. REFERS outperforms its transfer learning and self-supervised learning counterparts on 4 well-known X-ray datasets under extremely limited supervision. Moreover, REFERS even surpasses methods based on a source domain of radiographs with human-assisted structured labels. Thus REFERS has the potential to replace canonical pre-training methodologies.

CVSep 7, 2021
nnFormer: Interleaved Transformer for Volumetric Segmentation

Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang et al.

Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to overcome their inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer, a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, nnFormer produces significantly lower HD95 and comparable DSC results. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling.