Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild
This work addresses the problem of creating animatable digital humans for applications like virtual reality and content creation, offering a method that doesn't require pre-rigged models or ground truth meshes, which simplifies the process for artists and developers.
This paper reconstructs an animatable model of a person from an in-the-wild video, allowing rendering in any body pose and camera view without explicit 3D mesh reconstruction. The method uses a volumetric 3D human representation trained on input video, enabling novel pose and view synthesis.
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video. The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction. At the core of our method is a volumetric 3D human representation reconstructed with a deep network trained on input video, enabling novel pose/view synthesis. Our method is an advance over GAN-based image-to-image translation since it allows image synthesis for any pose and camera via the internal 3D representation, while at the same time it does not require a pre-rigged model or ground truth meshes for training, as in mesh-based learning. Experiments validate the design choices and yield results on synthetic data and on real videos of diverse people performing unconstrained activities (e.g. dancing or playing tennis). Finally, we demonstrate motion re-targeting and bullet-time rendering with the learned models.