CVDec 12, 2022

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

Microsoft
arXiv:2212.06135v1380 citationsh-index: 74
Originality Highly original
AI Analysis

This work addresses the problem of computational inefficiency in 3D avatar generation for applications in digital media and AI, offering a novel method that enables high-fidelity and editable avatars.

The paper tackles the challenge of generating high-quality 3D digital avatars with prohibitive memory and processing costs by proposing Rodin, a diffusion-based model that represents neural radiance fields as 2D feature maps for efficient 3D-aware diffusion, resulting in highly detailed avatars with realistic features like hairstyles and beards.

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes