InstructHumans: Editing Animated 3D Human Textures with Instructions
This work addresses the challenge of instruction-driven 3D human texture editing for applications in animation and virtual reality, representing an incremental improvement over prior methods.
The paper tackles the problem of editing animated 3D human textures using text instructions, where existing methods based on Score Distillation Sampling (SDS) often destroy consistency with the source avatar. It proposes a modified SDS for Editing (SDS-E) that outperforms existing 3D editing methods by maintaining avatar consistency while reflecting textual edits.
We present InstructHumans, a novel framework for instruction-driven {animatable} 3D human texture editing. Existing text-based 3D editing methods often directly apply Score Distillation Sampling (SDS). SDS, designed for generation tasks, cannot account for the defining requirement of editing -- maintaining consistency with the source avatar. This work shows that naively using SDS harms editing, as it may destroy consistency. We propose a modified SDS for Editing (SDS-E) that selectively incorporates subterms of SDS across diffusion timesteps. We further enhance SDS-E with spatial smoothness regularization and gradient-based viewpoint sampling for edits with sharp and high-fidelity detailing. Incorporating SDS-E into a 3D human texture editing framework allows us to outperform existing 3D editing methods. Our avatars faithfully reflect the textual edits while remaining consistent with the original avatars. Project page: https://jyzhu.top/instruct-humans/.