MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion
This work addresses a key problem in text-to-3D generation for applications requiring realistic lighting control, representing an incremental advancement by integrating lighting conditions into the generation process.
The paper tackles the challenge of decoupling light-independent and lighting-dependent components in text-to-3D generation to improve model quality and relighting performance, resulting in enhanced geometric precision and relighting capabilities as validated by experiments and a user study.
Recent advancements in text-to-3D generation, building on the success of high-performance text-to-image generative models, have made it possible to create imaginative and richly textured 3D objects from textual descriptions. However, a key challenge remains in effectively decoupling light-independent and lighting-dependent components to enhance the quality of generated 3D models and their relighting performance. In this paper, we present MVLight, a novel light-conditioned multi-view diffusion model that explicitly integrates lighting conditions directly into the generation process. This enables the model to synthesize high-quality images that faithfully reflect the specified lighting environment across multiple camera views. By leveraging this capability to Score Distillation Sampling (SDS), we can effectively synthesize 3D models with improved geometric precision and relighting capabilities. We validate the effectiveness of MVLight through extensive experiments and a user study.