GR CVAug 28, 2023

MagicAvatar: Multimodal Avatar Generation and Animation

Jianfeng Zhang, Hanshu Yan, Zhongcong Xu, Jiashi Feng, Jun Hao Liew

arXiv:2308.14748v113.024 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses avatar generation and animation for applications in virtual reality or content creation, but it appears incremental as it builds on existing multimodal methods by adding a two-stage approach.

MagicAvatar tackles multimodal avatar video generation by disentangling it into two stages: translating inputs into motion signals and then generating video from those signals, enabling avatar animation from a few images with demonstrated flexibility in text- and video-guided applications.

This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars. Unlike most existing methods that generate avatar-centric videos directly from multimodal inputs (e.g., text prompts), MagicAvatar explicitly disentangles avatar video generation into two stages: (1) multimodal-to-motion and (2) motion-to-video generation. The first stage translates the multimodal inputs into motion/ control signals (e.g., human pose, depth, DensePose); while the second stage generates avatar-centric video guided by these motion signals. Additionally, MagicAvatar supports avatar animation by simply providing a few images of the target person. This capability enables the animation of the provided human identity according to the specific motion derived from the first stage. We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation, as well as multimodal avatar animation.

View on arXiv PDF

Similar