TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
This addresses the problem of data scarcity and quality issues in 3D human texturing for applications like gaming or virtual reality, representing a novel method rather than an incremental improvement.
The paper tackles the challenge of generating high-fidelity 3D human textures from text or images by introducing TexDreamer, a zero-shot model that achieves generation within seconds and includes ATLAS, a dataset of 50k high-resolution textures.
Texturing 3D humans with semantic UV maps remains a challenge due to the difficulty of acquiring reasonably unfolded UV. Despite recent text-to-3D advancements in supervising multi-view renderings using large text-to-image (T2I) models, issues persist with generation speed, text consistency, and texture quality, resulting in data scarcity among existing datasets. We present TexDreamer, the first zero-shot multimodal high-fidelity 3D human texture generation model. Utilizing an efficient texture adaptation finetuning strategy, we adapt large T2I model to a semantic UV structure while preserving its original generalization capability. Leveraging a novel feature translator module, the trained model is capable of generating high-fidelity 3D human textures from either text or image within seconds. Furthermore, we introduce ArTicuLated humAn textureS (ATLAS), the largest high-resolution (1024 X 1024) 3D human texture dataset which contains 50k high-fidelity textures with text descriptions.