Haijie Yuan

CE
4papers
8citations
Novelty44%
AI Score49

4 Papers

42.7CVMay 27
Trajectory Constraints for Imaging Inverse Problems

Chaoyan Huang, Haijie Yuan, Saiprasad Ravishankar

Diffusion-based and iterative methods have become effective tools for solving imaging inverse problems. Their reconstruction process naturally forms a trajectory of intermediate estimates. Although these intermediate estimates define a reconstruction trajectory, most methods do not explicitly regularize the transitions between consecutive states. To address this limitation, we introduce TRACE, a training-free TRAjectory-Constrained rEconstruction framework that stabilizes the reconstruction path by coupling adjacent states along the trajectory. This gives a trajectory-level model that can be interpreted as a sequence of proximal updates. Since the exact proximal update is generally intractable, we approximate it with a neural mapping. This yields a diffusion-like reconstruction process with an explicit coupling between neighboring states. We provide a stability analysis showing that temporal coupling bounds trajectory variation and that this control is preserved under untrained network updates. Experiments on linear and nonlinear image reconstruction tasks show that TRACE improves reconstruction quality. Trajectory-level analyses and ablations confirm that temporal coupling directly affects state transitions along the reconstruction path.

SDJan 30Code
LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild

Zhipeng Chen, Xinheng Wang, Lun Xie et al.

Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U-Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high-quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces instability possibility during the training process. The proposed method achieves outstanding performance in terms of lip synchronization accuracy and visual quality as demonstrated by subjective and objective evaluation results. The code for the paper is available at the following link: https://github.com/FelixChan9527/LPIPS-AttnWav2Lip

53.4CEMay 23
Fractional-gradient Sparsity with Autoencoding Sequential Deep Image Prior for 3D CT Reconstruction

Haijie Yuan, Chaoyan Huang, Srijita Bandopadhyay et al.

3D volumetric reconstruction from incomplete or noisy measurements is a fundamental problem in medical imaging and computational tomography. Deep image prior (DIP)-based methods have recently shown strong capability for solving inverse problems without requiring large training datasets. However, directly extending DIP to 3D reconstruction by fully 3D networks can incur high computational cost, while slice-by-slice 2D DIP approaches may lead to inter-slice inconsistencies due to the lack of explicit regularization along the third direction. In this paper, we propose a novel volumetric reconstruction framework, Fractional-gradient Autoencoding Sequential Tomography DIP (FAST-DIP), which integrates input-adaptive sequential deep image prior modeling of slices with fractional sparsity regularization to capture inter-slice dependencies. Specifically, we introduce a fractionall1/l2-based sparsity prior on the gradients along the slice (z) direction to explicitly enforce inter-slice structural consistency. We further provide theoretical analysis of the proposed alternating minimization algorithm under the majorization-minimization (MM) framework, establishing monotonic descent of the objective function and convergence to a critical point under the Kurdyka-Lojasiewicz (KL) property. Experimental results for 3D X-ray computed tomography (CT) reconstruction demonstrate that the proposed method improved reconstruction quality and structural consistency compared with existing DIP-based approaches.

96.7IVMay 14
ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

Yixuan Jia, Siyi Chen, Yida Pan et al.

Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.