Virtual Pets: Animatable Animal Generation in 3D Scenes
This work addresses the challenge of limited 3D motion data for immersive virtual experiences, offering a solution for generating animatable animals in 3D scenes, though it is incremental in leveraging existing techniques like NeRF.
The paper tackles the problem of generating realistic and diverse 3D motions for animals in 3D scenes by introducing Virtual Pet, a pipeline that uses monocular internet videos to create animatable animal models, achieving temporally coherent 4D outputs for cats and indoor environments.
Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment. To circumvent the limited availability of 3D motion data aligned with environmental geometry, we leverage monocular internet videos and extract deformable NeRF representations for the foreground and static NeRF representations for the background. For this, we develop a reconstruction strategy, encompassing species-level shared template learning and per-video fine-tuning. Utilizing the reconstructed data, we then train a conditional 3D motion model to learn the trajectory and articulation of foreground animals in the context of 3D backgrounds. We showcase the efficacy of our pipeline with comprehensive qualitative and quantitative evaluations using cat videos. We also demonstrate versatility across unseen cats and indoor environments, producing temporally coherent 4D outputs for enriched virtual experiences.