Zhongjian Wang

h-index13

11papers

241citations

Novelty52%

AI Score47

Ranked #34,543 of 194,257 authors (top 18%)#12,302 in CV (top 21%)

11 Papers

24.8CVApr 11, 2023

One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

Weichuang Li, Longhao Zhang, Dong Wang et al.

Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/

18.4LGJan 19, 2023Code

Mathematical analysis of singularities in the diffusion model under the submanifold assumption

Yubin Lu, Zhongjian Wang, Guillaume Bal

This paper concerns the mathematical analyses of the diffusion model in machine learning. The drift term of the backward sampling process is represented as a conditional expectation involving the data distribution and the forward diffusion. The training process aims to find such a drift function by minimizing the mean-squared residue related to the conditional expectation. Using small-time approximations of the Green's function of the forward diffusion, we show that the analytical mean drift function in DDPM and the score function in SGM asymptotically blow up in the final stages of the sampling process for singular data distributions such as those concentrated on lower-dimensional manifolds, and are therefore difficult to approximate by a network. To overcome this difficulty, we derive a new target function and associated loss, which remains bounded even for singular data distributions. We validate the theoretical findings with several numerical examples.

1.2NANov 26, 2017

Computing effective diffusivity of chaotic and stochastic flows using structure preserving schemes

Zhongjian Wang, Jack Xin, Zhiwen Zhang

In this paper we study the problem of computing the effective diffusivity for a particle moving in chaotic and stochastic flows. In addition we numerically investigate the residual diffusion phenomenon in chaotic advection. The residual diffusion refers to the non-zero effective (homogenized) diffusion in the limit of zero molecular diffusion as a result of chaotic mixing of the streamlines. In this limit traditional numerical methods typically fail since the solutions of the advection-diffusion equation develop sharp gradients. Instead of solving the Fokker-Planck equation in the Eulerian formulation, we compute the motion of particles in the Lagrangian formulation, which is modelled by stochastic differential equations (SDEs). We propose a new numerical integrator based on a stochastic splitting method to solve the corresponding SDEs in which the deterministic subproblem is symplectic preserving while the random subproblem can be viewed as a perturbation. We provide rigorous error analysis for the new numerical integrator using the backward error analysis technique and show that our method outperforms standard Euler-based integrators. Numerical results are presented to demonstrate the accuracy and efficiency of the proposed method for several typical chaotic and stochastic flow problems of physical interests.

3.3COMP-PHAug 31, 2022

A DeepParticle method for learning and generating aggregation patterns in multi-dimensional Keller-Segel chemotaxis systems

Zhongjian Wang, Jack Xin, Zhiwen Zhang

We study a regularized interacting particle method for computing aggregation patterns and near singular solutions of a Keller-Segal (KS) chemotaxis system in two and three space dimensions, then further develop DeepParticle (DP) method to learn and generate solutions under variations of physical parameters. The KS solutions are approximated as empirical measures of particles which self-adapt to the high gradient part of solutions. We utilize the expressiveness of deep neural networks (DNNs) to represent the transform of samples from a given initial (source) distribution to a target distribution at finite time T prior to blowup without assuming invertibility of the transforms. In the training stage, we update the network weights by minimizing a discrete 2-Wasserstein distance between the input and target empirical measures. To reduce computational cost, we develop an iterative divide-and-conquer algorithm to find the optimal transition matrix in the Wasserstein distance. We present numerical results of DP framework for successful learning and generation of KS dynamics in the presence of laminar and chaotic flows. The physical parameter in this work is either the small diffusivity of chemo-attractant or the reciprocal of the flow amplitude in the advection-dominated regime.

2.4MATH-PHMay 19

Inverse scattering for waveguides in topological insulators

Guillaume Bal, Xixian Wang, Zhongjian Wang

This paper concerns the inverse scattering problem of a topologically non-trivial waveguide separating two-dimensional topological insulators. We consider the specific model of a Dirac system. We show that a short-range perturbation can be fully reconstructed from scattering data in a linearized setting and in a finite-dimensional setting under a smallness constraint. We also provide a stability result in appropriate topologies. We then solve the problem numerically by means of a standard adjoint method and illustrate our theoretical findings with several numerical simulations.

3.3COMP-PHSep 5, 2022

A variational neural network approach for glacier modelling with nonlinear rheology

Tiangang Cui, Zhongjian Wang, Zhiwen Zhang

In this paper, we propose a mesh-free method to solve full stokes equation which models the glacier movement with nonlinear rheology. Our approach is inspired by the Deep-Ritz method proposed in [12]. We first formulate the solution of non-Newtonian ice flow model into the minimizer of a variational integral with boundary constraints. The solution is then approximated by a deep neural network whose loss function is the variational integral plus soft constraint from the mixed boundary conditions. Instead of introducing mesh grids or basis functions to evaluate the loss function, our method only requires uniform samplers of the domain and boundaries. To address instability in real-world scaling, we re-normalize the input of the network at the first layer and balance the regularizing factors for each individual boundary. Finally, we illustrate the performance of our method by several numerical experiments, including a 2D model with analytical solution, Arolla glacier model with real scaling and a 3D model with periodic boundary conditions. Numerical results show that our proposed method is efficient in solving the non-Newtonian mechanics arising from glacier modeling with nonlinear rheology.

13.4LGDec 15, 2024

Wasserstein Bounds for generative diffusion models with Gaussian tail targets

Xixian Wang, Zhongjian Wang

We present an estimate of the Wasserstein distance between the data distribution and the generation of score-based generative models. The sampling complexity with respect to dimension is $\mathcal{O}(\sqrt{d})$, with a logarithmic constant. In the analysis, we assume a Gaussian-type tail behavior of the data distribution and an $ε$-accurate approximation of the score. Such a Gaussian tail assumption is general, as it accommodates a practical target - the distribution from early stopping techniques with bounded support. The crux of the analysis lies in the global Lipschitz bound of the score, which is shown from the Gaussian tail assumption by a dimension-independent estimate of the heat kernel. Consequently, our complexity bound scales linearly (up to a logarithmic constant) with the square root of the trace of the covariance operator, which relates to the invariant distribution of the forward process.

14.4CVApr 3, 2025

OmniTalker: One-shot Real-time Text-Driven Talking Audio-Video Generation With Multimodal Style Mimicking

Zhongjian Wang, Peng Zhang, Jinwei Qi et al.

Although significant progress has been made in audio-driven talking head generation, text-driven methods remain underexplored. In this work, we present OmniTalker, a unified framework that jointly generates synchronized talking audio-video content from input text while emulating the speaking and facial movement styles of the target identity, including speech characteristics, head motion, and facial dynamics. Our framework adopts a dual-branch diffusion transformer (DiT) architecture, with one branch dedicated to audio generation and the other to video synthesis. At the shallow layers, cross-modal fusion modules are introduced to integrate information between the two modalities. In deeper layers, each modality is processed independently, with the generated audio decoded by a vocoder and the video rendered using a GAN-based high-quality visual renderer. Leveraging the in-context learning capability of DiT through a masked-infilling strategy, our model can simultaneously capture both audio and visual styles without requiring explicit style extraction modules. Thanks to the efficiency of the DiT backbone and the optimized visual renderer, OmniTalker achieves real-time inference at 25 FPS. To the best of our knowledge, OmniTalker is the first one-shot framework capable of jointly modeling speech and facial styles in real time. Extensive experiments demonstrate its superiority over existing methods in terms of generation quality, particularly in preserving style consistency and ensuring precise audio-video synchronization, all while maintaining efficient inference.

33.6CVSep 17, 2025

Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Gang Cheng, Xin Gao, Li Hu et al.

We introduce Wan-Animate, a unified framework for character animation and replacement. Given a character image and a reference video, Wan-Animate can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos. Alternatively, it can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone to achieve seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ a modified input paradigm to differentiate between reference conditions and regions for generation. This design unifies multiple tasks into a common symbolic representation. We use spatially-aligned skeleton signals to replicate body motion and implicit facial features extracted from source images to reenact expressions, enabling the generation of character videos with high controllability and expressiveness. Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module preserves the character's appearance consistency while applying the appropriate environmental lighting and color tone. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and its source code.

2.3NAApr 25, 2025

PODNO: Proper Orthogonal Decomposition Neural Operators

Zilan Cheng, Zhongjian Wang, Li-Lian Wang et al.

In this paper, we introduce Proper Orthogonal Decomposition Neural Operators (PODNO) for solving partial differential equations (PDEs) dominated by high-frequency components. Building on the structure of Fourier Neural Operators (FNO), PODNO replaces the Fourier transform with (inverse) orthonormal transforms derived from the Proper Orthogonal Decomposition (POD) method to construct the integral kernel. Due to the optimality of POD basis, the PODNO has potential to outperform FNO in both accuracy and computational efficiency for high-frequency problems. From analysis point of view, we established the universality of a generalization of PODNO, termed as Generalized Spectral Operator (GSO). In addition, we evaluate PODNO's performance numerically on dispersive equations such as the Nonlinear Schrodinger (NLS) equation and the Kadomtsev-Petviashvili (KP) equation.

8.4LGNov 2, 2021

DeepParticle: learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method

Zhongjian Wang, Jack Xin, Zhiwen Zhang

We introduce the so called DeepParticle method to learn and generate invariant measures of stochastic dynamical systems with physical parameters based on data computed from an interacting particle method (IPM). We utilize the expressiveness of deep neural networks (DNNs) to represent the transform of samples from a given input (source) distribution to an arbitrary target distribution, neither assuming distribution functions in closed form nor a finite state space for the samples. In training, we update the network weights to minimize a discrete Wasserstein distance between the input and target samples. To reduce computational cost, we propose an iterative divide-and-conquer (a mini-batch interior point) algorithm, to find the optimal transition matrix in the Wasserstein distance. We present numerical results to demonstrate the performance of our method for accelerating IPM computation of invariant measures of stochastic dynamical systems arising in computing reaction-diffusion front speeds through chaotic flows. The physical parameter is a large Peclét number reflecting the advection dominated regime of our interest.