Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions
This addresses the need for efficient avatar creation in digital media and VR/AR applications, representing an incremental improvement by combining existing techniques like diffusion models and neural radiance fields.
The paper tackles the problem of generating edited photo-realistic 3D neural head avatars from monocular RGB videos using text instructions, achieving results that outperform state-of-the-art methods in quantitative and qualitative studies.
We propose a method for synthesizing edited photo-realistic digital avatars with text instructions. Given a short monocular RGB video and text instructions, our method uses an image-conditioned diffusion model to edit one head image and uses the video stylization method to accomplish the editing of other head images. Through iterative training and update (three times or more), our method synthesizes edited photo-realistic animatable 3D neural head avatars with a deformable neural radiance field head synthesis method. In quantitative and qualitative studies on various subjects, our method outperforms state-of-the-art methods.