Text-to-3D with Classifier Score Distillation
This work addresses the problem of generating 3D content from text for applications in graphics and AI, offering a novel perspective that re-evaluates a key component in existing techniques.
The paper tackles text-to-3D generation by showing that classifier-free guidance alone is sufficient for effective optimization, achieving superior results to state-of-the-art methods in tasks like shape generation and texture synthesis.
Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation