CVFeb 6, 2025

DICE: Distilling Classifier-Free Guidance into Text Embeddings

arXiv:2502.03726v13 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the computational and theoretical drawbacks of classifier-free guidance for users of text-to-image models, offering an incremental improvement by refining embeddings to replicate its benefits.

The paper tackles the problem of text-to-image diffusion models generating images that poorly align with text prompts, and presents DICE, a method that distills classifier-free guidance into text embeddings to achieve high-quality, well-aligned image generation with fast sampling speed, as demonstrated on models like Stable Diffusion v1.5, SDXL, and PixArt-α.

Text-to-image diffusion models are capable of generating high-quality images, but these images often fail to align closely with the given text prompts. Classifier-free guidance (CFG) is a popular and effective technique for improving text-image alignment in the generative process. However, using CFG introduces significant computational overhead and deviates from the established theoretical foundations of diffusion models. In this paper, we present DIstilling CFG by enhancing text Embeddings (DICE), a novel approach that removes the reliance on CFG in the generative process while maintaining the benefits it provides. DICE distills a CFG-based text-to-image diffusion model into a CFG-free version by refining text embeddings to replicate CFG-based directions. In this way, we avoid the computational and theoretical drawbacks of CFG, enabling high-quality, well-aligned image generation at a fast sampling speed. Extensive experiments on multiple Stable Diffusion v1.5 variants, SDXL and PixArt-$α$ demonstrate the effectiveness of our method. Furthermore, DICE supports negative prompts for image editing to improve image quality further. Code will be available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes