Yubo Zhao

h-index26

3papers

4citations

Novelty60%

AI Score43

Ranked #53,781 of 194,257 authors (top 28%)#18,708 in CV (top 32%)

3 Papers

1.2MATH-PHFeb 17, 2013

A New Splitting Method for Time-dependent Convection-dominated Diffusion Problems

Feng Shi, Guoping Liang, Yubo Zhao et al.

We present a new splitting method for time-dependent convection-dominated diffusion problems. The original convection diffusion system is split into two sub-systems: a pure convection system and a diffusion system. At each time step, a convection problem and a diffusion problem are solved successively. The scheme has the following nice features: the convection subproblem is solved explicitly and a multistep technique is introduced to essentially enlarge the stability region so that the resulting scheme behaves like an unconditionally stable scheme; the diffusion subproblem is always self-adjoint and coercive so that it can be solved efficiently using many existing optimal preconditioned iterative solvers. The scheme is then extended for Navier-Stokes equations, where the nonlinear convection is resolved by a linear explicit multistep scheme at the convection step, and only a generalized Stokes problem is needed to solve at the diffusion step with the resulting stiffness matrix being invariant in the time marching process. The new schemes are all free from tuning some stabilization parameters for the convection-dominated diffusion problems. Numerical simulations are presented to demonstrate the stability, convergence and performance of the single-step and multistep variants of the new scheme.

8.5CVMay 14

Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos

Yubo Zhao, Yujin Chai, Yunao Dong et al.

Recovering 4D human-object interaction (HOI) from monocular video is a key step toward scalable 3D content creation, embodied AI, and simulation-based learning. Recent methods can reconstruct temporally coherent human and object trajectories, but these trajectories often remain visual artifacts while failing to preserve stable contact, functional manipulation, or physical plausibility when used as reference motions for humanoid-object simulation. This reveals a fundamental interaction gap: HOI reconstruction should not stop at tracking a human and an object, but should recover the relation that makes their motion a coherent interaction. We introduce $\textbf{HA-HOI}$, a framework for reconstructing physically plausible 4D HOI animation from in-the-wild monocular videos. Instead of treating the human and object as independent entities in an ambiguous monocular 3D space, we propose a $\textit{human-first, object-follow}$ formulation. The human motion is recovered as the interaction anchor, and the object is reconstructed, aligned, and refined relative to the human action. The resulting kinematic trajectory is then projected into a physics-based humanoid-object simulation, where it acts as a teacher trajectory for stable physical rollout. Across benchmark and in-the-wild videos, $\textbf{HA-HOI}$ improves human-object alignment, contact consistency, temporal stability, and simulation readiness over prior monocular HOI reconstruction methods. By moving beyond visually plausible trajectory recovery toward physically grounded interaction animation, our work takes a step toward turning general monocular HOI videos into scalable demonstrations for humanoid-object behavior. Project page: https://knoxzhao.github.io/real2sim_in_HOI/

5.0CVJan 15

CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

Chengfeng Zhao, Jiazhi Shu, Yubo Zhao et al.

In this paper, we find that the generation of 3D human motions and 2D human videos is intrinsically coupled. 3D motions provide the structural prior for plausibility and consistency in videos, while pre-trained video models offer strong generalization capabilities for motions, which necessitate coupling their generation processes. Based on this, we present CoMoVi, a co-generative framework that couples two video diffusion models (VDMs) to generate 3D human motions and videos synchronously within a single diffusion denoising loop. To achieve this, we first propose an effective 2D human motion representation that can inherit the powerful prior of pre-trained VDMs. Then, we design a dual-branch diffusion model to couple human motion and video generation process with mutual feature interaction and 3D-2D cross attentions. Moreover, we curate CoMoVi Dataset, a large-scale real-world human video dataset with text and motion annotations, covering diverse and challenging human motions. Extensive experiments demonstrate the effectiveness of our method in both 3D human motion and video generation tasks.