Stable Score Distillation
This addresses stability and precision problems in text-guided editing for image and 3D generation applications, representing a strong incremental improvement over existing methods.
The paper tackles stability, spatial control, and editing strength issues in text-guided image and 3D editing by introducing Stable Score Distillation (SSD), which anchors a classifier to the source prompt and uses a null-text branch for stabilization. It achieves state-of-the-art results in 2D and 3D editing tasks with faster convergence and reduced complexity.
Text-guided image and 3D editing have advanced with diffusion-based models, yet methods like Delta Denoising Score often struggle with stability, spatial control, and editing strength. These limitations stem from reliance on complex auxiliary structures, which introduce conflicting optimization signals and restrict precise, localized edits. We introduce Stable Score Distillation (SSD), a streamlined framework that enhances stability and alignment in the editing process by anchoring a single classifier to the source prompt. Specifically, SSD utilizes Classifier-Free Guidance (CFG) equation to achieves cross-prompt alignment, and introduces a constant term null-text branch to stabilize the optimization process. This approach preserves the original content's structure and ensures that editing trajectories are closely aligned with the source prompt, enabling smooth, prompt-specific modifications while maintaining coherence in surrounding regions. Additionally, SSD incorporates a prompt enhancement branch to boost editing strength, particularly for style transformations. Our method achieves state-of-the-art results in 2D and 3D editing tasks, including NeRF and text-driven style edits, with faster convergence and reduced complexity, providing a robust and efficient solution for text-guided editing.