Multimodal Generation of Animatable 3D Human Models with AvatarForge
This addresses the challenge of creating customizable and animatable human avatars for artistic creation and animation, representing a novel method for a known bottleneck.
The paper tackled the problem of generating high-quality, animatable 3D human avatars from text or image inputs, overcoming limitations of existing methods by combining LLM-based reasoning with 3D generators, resulting in outperforming state-of-the-art methods in evaluations.
We introduce AvatarForge, a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation. While diffusion-based methods have made strides in general 3D object generation, they struggle with high-quality, customizable human avatars due to the complexity and diversity of human body shapes, poses, exacerbated by the scarcity of high-quality data. Additionally, animating these avatars remains a significant challenge for existing methods. AvatarForge overcomes these limitations by combining LLM-based commonsense reasoning with off-the-shelf 3D human generators, enabling fine-grained control over body and facial details. Unlike diffusion models which often rely on pre-trained datasets lacking precise control over individual human features, AvatarForge offers a more flexible approach, bringing humans into the iterative design and modeling loop, with its auto-verification system allowing for continuous refinement of the generated avatars, and thus promoting high accuracy and customization. Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation, making it a versatile tool for artistic creation and animation.