GRCVAug 28, 2023

MagicAvatar: Multimodal Avatar Generation and Animation

arXiv:2308.14748v124 citationsh-index: 46
Originality Incremental advance
AI Analysis

This work addresses avatar generation and animation for applications in virtual reality or content creation, but it appears incremental as it builds on existing multimodal methods by adding a two-stage approach.

MagicAvatar tackles multimodal avatar video generation by disentangling it into two stages: translating inputs into motion signals and then generating video from those signals, enabling avatar animation from a few images with demonstrated flexibility in text- and video-guided applications.

This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars. Unlike most existing methods that generate avatar-centric videos directly from multimodal inputs (e.g., text prompts), MagicAvatar explicitly disentangles avatar video generation into two stages: (1) multimodal-to-motion and (2) motion-to-video generation. The first stage translates the multimodal inputs into motion/ control signals (e.g., human pose, depth, DensePose); while the second stage generates avatar-centric video guided by these motion signals. Additionally, MagicAvatar supports avatar animation by simply providing a few images of the target person. This capability enables the animation of the provided human identity according to the specific motion derived from the first stage. We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation, as well as multimodal avatar animation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes