CVJan 10, 2024

Score Distillation Sampling with Learned Manifold Corrective

arXiv:2401.05293v231 citationsh-index: 65ECCV
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in SDS for text-guided optimization problems, offering an incremental improvement with practical applications in image and 3D synthesis.

The paper tackled the problem of noisy gradients and unwanted side effects like oversaturation in Score Distillation Sampling (SDS) by identifying an inherent issue in its formulation and proposing a simple fix using a shallow network to factor out timestep-dependent frequency bias, resulting in improved performance across tasks such as image synthesis and text-to-3D synthesis.

Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail. Instead, we train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes